简体   繁体   中英

How to ensure that Strings are in UTF-8?

How to convert this String the surveyÂ's rules to UTF-8 in Scala?

I tried these roads but does not work:

scala> val text = "the surveyÂ’s rules"
text: String = the surveyÂ’s rules

scala> scala.io.Source.fromBytes(text.getBytes(), "UTF-8").mkString
res17: String = the surveyÂ’s rules

scala> new String(text.getBytes(),"UTF8")
res21: String = the surveyÂ’s rules

Ok, i'm resolved in this way. Not a converting but a simple reading

implicit val codec = Codec("US-ASCII").onMalformedInput(CodingErrorAction.IGNORE).onUnmappableCharacter(CodingErrorAction.IGNORE)

val src = Source.fromFile(new File (folderDestination + name + ".csv"))
val src2 = Source.fromFile(new File (folderDestination + name + ".csv"))

val reader = CSVReader.open(src.reader())

Note that when you call text.getBytes() without arguments, you're in fact getting an array of bytes representing the string in your platform's default encoding . On Windows, for example, it could be some single-byte encoding; on Linux it can be UTF-8 already.

To be correct you need to specify exact encoding in getBytes() method call. For Java 7 and later do this:

import java.nio.charset.StandardCharsets

val bytes = text.getBytes(StandardCharsets.UTF_8)

For Java 6 do this:

import java.nio.charset.Charset

val bytes = text.getBytes(Charset.forName("UTF-8"))

Then bytes will contain UTF-8-encoded text.

Just set the JVM's file.encoding parameter to UTF-8 as follows:

-Dfile.encoding=UTF-8

It makes sure that UTF-8 is the default encoding.

Using scala it could be scala -Dfile.encoding=UTF-8 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM