简体   繁体   中英

Converting ISO-8859-1 to UTF-8 for MultipartFormData in Play2 + Scala when parsing email from Sendgrid

I have hooked up my Play2+Scala application to Sendgrid Parse Api and I'm really struggling in decoding and encoding the content of the email.

Since the emails could be in different encodings Sendgrid provides us with a JSON object charsets:

{"to":"UTF-8","cc":"UTF-8","subject":"UTF-8","from":"UTF-8","text":"iso-8859-1","html":"iso-8859-1"}

In my test case "text" is "Med Vänliga Hälsningar Jakobs Webshop" If I extract that from the multipart request and print it out:

Logger.info(request.body.dataParts.get("text").get)

I get:

Med V?nliga H?lsningar Jakobs Webshop

Ok so with the given info from Sendgrid let's fix the string so that it is UTF-8 .

def parseMail = Action(parse.multipartFormData) {
    request => {

    val inputBuffer = request.body.dataParts.get("text").map {
        v => ByteBuffer.wrap(v.head.getBytes())
    }

    val fromCharset = Charset.forName("ISO-8859-1")
    val toCharset = Charset.forName("UTF-8")

    val data = fromCharset.decode(inputBuffer.get)
    Logger.info(""+data)

    val outputBuffer = toCharset.encode(data)
    val text = new String(outputBuffer.array())

    // Save stuff to MongoDB instance

}

This results in:

Med V�nliga H�lsningar Jakobs Webshop

So this is very strange. This should work. I wonder what actually happens in the body parser parse.multipartFormData and the datapart handler:

def handleDataPart: PartHandler[Part] = {
        case headers @ PartInfoMatcher(partName) if !FileInfoMatcher.unapply(headers).isDefined =>
          Traversable.takeUpTo[Array[Byte]](DEFAULT_MAX_TEXT_LENGTH)
            .transform(Iteratee.consume[Array[Byte]]().map(bytes => DataPart(partName, new String(bytes, "utf-8")))(play.core.Execution.internalContext))
            .flatMap { data =>
              Cont({
                case Input.El(_) => Done(MaxDataPartSizeExceeded(partName), Input.Empty)
                case in => Done(data, in)
              })
            }(play.core.Execution.internalContext)
      } 

When consuming the data a new String is created with the encoding utf-8:

.transform(Iteratee.consume[Array[Byte]]().map(bytes => DataPart(partName, new String(bytes, "utf-8")))(play.core.Execution.internalContext))

Does this mean that my ISO-8859-1 encoded string text is encoded with utf-8 when parsed? If so, how should I create my parser to decode and then encode my params according to the provided JSON object charsets? Clearly I'm doing something wrong but I can't figure it out!

Have you tried changing the default encoding to UTF-8?

See this question for details: Printing Unicode from Scala interpreter

You'll need to copy the implementation of the parse.multipartFormData function, changing the decodings from utf-8 to iso-8859-1 , and use that in your Action.

The problem is that play decodes everything with UTF-8 by default, and there is no way to change that, other than implementing your own parser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM