I am reading in pipe delimitated text in from a flat file and am having an error parsing the text. I am an old Java hand but I haven't touched it for a few years. Here is the code:
String zipString = tokenizerForOneLine.nextToken();
System.out.println( "Zip String: -->" + zipString + "<--");
//zipString = "18103"; <<<This works!!!
int zipInt = Integer.parseInt( zipString );
aProvider.setZipCode( zipInteger );
Here is the output:
Zip String: -->�1�8�1�0�3�<--
java.lang.NumberFormatException: For input string: "�1�8�1�0�3�"
NumberFormatException while reading file.
Detailed Message: For input string: "�1�8�1�0�3�"
My naive guess is that it is an encoding issue. Is this possible? It makes no sense to me. Or I am doing something really dumb and just don't see it?
How do I diagnose the encoding issue? (My data vendor claims it is in standard UNICODE).
Thanks-in-advance,
Guido
Make sure you are building a reader with the proper encoding. Your code should look this:
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("data.csv"), encoding));
String line;
while ((line = in.readLine()) != null) {
StringTokenizer tokenizer = new StringTokenizer(line, "|");
...
}
The encoding is probably UTF-16.
Also, if the file has byte order marks you might use the BOMInputStream
from Commons IO to detect the encoding automatically.
http://commons.apache.org/io/api-release/org/apache/commons/io/input/BOMInputStream.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.