cyclic landscape: GAE for Javaでquoted-printableなメールの受信

Google App Engine for Javaでメールを受信する処理を作ってたら、quoted-printableエンコーディングされたメール本文をデコードすると、デコード結果が途中で切れてしまう現象に遭遇した。調べてみると、quoted-printableエンコーディングでいうところの「Soft Line Breaks」がうまく認識されないことが原因だったので対策してみた。

まず、「Soft Line Breaks」については、RFCのquoted-printableエンコーディング規則の中で以下のように説明されている。要は、長すぎる行には適宜「=」＋「改行」を入れて分割する、というエンコーディングが行われるとのこと。

Rule #5 (Soft Line Breaks):
The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long. If longer lines are to be encoded with the Quoted-Printable encoding, 'soft' line breaks must be used. An equal sign as the last character on a encoded line indicates such a non-significant ('soft') line break in the encoded text.
RFC 1521: MIME Part One

RFCにはこう書いてあるのに、試してみると、JavaMail 1.4のMimeUtility.decode()でも、Commons Codec 1.4のQuotedPrintableCodec.decode()でも、Soft Line Breaksが正しく処理されず、最初のSoft Line Breakのところまででデコード結果が切れてしまう。で、なんでかなぁと思ってQuotedPrintableCodecのドキュメントをちゃんと読んでみると、きちんと明記されていた。

Note:
Rules #3, #4, and #5 of the quoted-printable spec are not implemented yet because the complete quoted-printable spec does not lend itself well into the byte[] oriented codec framework.
QuotedPrintableCodec (Commons Codec 1.4 API)

というわけで、自分でSoft Line Breaksを取り除いてやる必要がある。MimeMessageから取り出したMimeBodyPartをデコードする処理は、例えばこんな感じ。効率の悪いコードだけど、今のところ問題なく動いているようだ。

private String _decodeBody(MimeBodyPart bp)
{
  // parse header
  String contentType = null;
  String contentEncoding = null;
  String charset = null;
  try{
    contentType = bp.getContentType();
    contentEncoding = bp.getEncoding();
  }catch(MessagingException e){
  }
  String[] elems = contentType.split(";");
  for(String elem : elems){
    if(elem.trim().startsWith("charset=")){
      charset = elem.trim().substring("charset=".length());
    }
  }
  if(charset!=null){
    if(charset.startsWith("\"")) charset = charset.substring(1);
    if(charset.endsWith("\"")) charset = charset.substring(0, charset.length()-1);
  }
  // get inputstream
  InputStream in = null;
  try{
    in = bp.getRawInputStream();
  }catch(MessagingException e){
  }
  if(in==null) return "";
  // convert quoted-printable
  if(contentEncoding!=null && contentEncoding.equals("quoted-printable")){
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    int len;
    byte[] buffer = new byte[1024];
    try{
      while( (len=in.read(buffer, 0, buffer.length)) != -1 ){
        baos.write(buffer, 0, len);
      }
    }catch(IOException e){
    }
    byte[] b = baos.toByteArray();
    baos = new ByteArrayOutputStream();
    for(int j=0;j<b.length;j++){
      if(b[j]=='=' && j<b.length-1 && b[j+1]=='\n'){
        j++;
      }else{
        baos.write(b[j]);
      }
    }
    b = baos.toByteArray();
    in = new ByteArrayInputStream(b);
  }
  // decode
  if(in!=null && contentEncoding!=null){
    try{
      in = MimeUtility.decode(in, contentEncoding);
    }catch(MessagingException e){
    }
  }
  if(in==null) return "";
  // read body
  Reader r = null;
  if(charset!=null){
    try{
      r = new InputStreamReader(in, charset);
    }catch(UnsupportedEncodingException e){
    }
  }else{
    r = new InputStreamReader(in);
  }
  StringBuffer sb = new StringBuffer();
  BufferedReader br = null;
  try{
    br = new BufferedReader(r);
    String line = null;
    while( (line=br.readLine())!=null ){
      sb.append(line.trim());
      sb.append("\n");
    }
  }catch(IOException e){
  }finally{
    if(br!=null){
      try{
        br.close();
      }catch(IOException e){}
    }
  }
  return sb.toString();
}

cyclic landscape

2010/03/06

GAE for Javaでquoted-printableなメールの受信

0 件のコメント: