简体   繁体   中英

Read a CSV file in UTF-8 format

I am reading a csv file in java, adding a new column with new information and exporting it back to a CSV file. I have a problem in reading the CSV file in UTF-8 format. I read line by line and store it in a StringBuilder , but when I print the line I can see that the information I'm reading is not in UTF-8 but in ANSI. I used both System.out.print and printstream in UTF and the information appears still in ANSI. This is my code :

    BufferedReader br;
    try {
        br = new BufferedReader(new InputStreamReader(new FileInputStream(
                "./users.csv"), "UTF8"));
        String line;
        while ((line = br.readLine()) != null) {
            if (line.contains("none@none.com")) {
                continue;
            }
            if (!line.contains("@") && !line.contains("FirstName")) {
                continue;
            }
            PrintStream ps = new PrintStream(System.out, true, "UTF-8");
            ps.print(line + "\n");
            sbusers.append(line);
            sbusers.append("\n");
            sbusers2.append(line);
            sbusers2.append(",");
        }
        br.close();
    } catch (IOException e) {
        System.out.println("Failed to read users file.");
    } finally {
    }

It prints out information like "Professor -P s". Since the reading isn't being done correctly the output to the new file is also being exported in ANSI.

Are you sure your CSV is UTF-8 encoded? My guess is that it's not. Try using ISO-8859-1 for reading the file, but keep the output as UTF-8 . ( UTF8 and UTF-8 both tend to work, but you should use UTF-8 as @Marcelo suggested)

In the line:

br = new BufferedReader(new InputStreamReader(new FileInputStream("./users.csv"),"UTF8"));

Your charset should be "UTF-8" not "UTF8" .

Printing to System.out using UTF encoding ????????????

Why would you do that ? System.out and the encoding it uses is determined at the OS level (it becomes the default charset in the JVM), and that's the only one you want to use on System.out.

Fist, as suggested by @Marcelo, use UTF8 instead of UTF-8 :

BufferedReader in = new BufferedReader(
       new InputStreamReader(
                  new FileInputStream("./users.csv"), "UTF8"));

Second, forget about the PrintStream , just use System.out , or better yet, a logging API. You don't need to worry about how Java will output your string to the console (number one rule about character encoding: After you've read things successfully, let Java handle the encoding and only worry about it again when you are writing to an external file / database / etc).

Third and more important, check that your file is really encoded in UTF-8, this is the cause of 99% of the encoding problems.

Make sure that you test with a real UTF-8 file (use tools like iconv to convert to UTF-8 and be sure about it).

found a potential solution(I had the same problem). Depending on the type of UTF-8 encoding you need to specify if further...

Replace:

br = new BufferedReader(new InputStreamReader(new FileInputStream(
            "./users.csv"), "UTF8"));

With:

br = new BufferedReader(new InputStreamReader(new FileInputStream(
            "./users.csv"), "ISO_8859_1"));

For further understanding: https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM