简体   繁体   English

Java使用来自不同语言的字符串读取文件

[英]Java read file with strings from different languages

I made a program that reads different text files and combines this into a .csv file. 我制作了一个程序,可以读取不同的文本文件并将其合并为.csv文件。 Its a .csv file with translations into English, dutch, french, italian, portuguese and spanish. 它是一个.csv文件,可翻译成英语,荷兰语,法语,意大利语,葡萄牙语和西班牙语。

Now here is my problem: 现在这是我的问题:

In the end i get a nice filled .csv file with all the translations together. 最后,我得到了一个不错的.csv文件,其中包含所有翻译内容。 I read the files with UTF-8 and all the languages get shown right except for the french one. 我使用UTF-8读取文件,除法语外的所有语言均正确显示。 Some chars are shows as Questionmarks like these: "Mis ? jour" and it should be "Mis à jour". 某些字符以如下问号的形式显示:“ Mis?jour”,应为“ Misàjour”。

Here is the method that reads the different files with the different languages and makes objects from them so i can sort them en put them in the right spot in the .csv file 这是一种使用不同语言读取不同文件并从中创建对象的方法,因此我可以对它们进行排序并将它们放在.csv文件中的正确位置

The files are filled like this: 文件填充如下:

To Airport;A l'aéroport 到机场;机场

Today;Aujourd'hui 今天; Aujourd'hui

public static Language getTranslations(String inputFileName) {
    Language language = new Language();

     FileInputStream fstream;
    try {
        fstream = new FileInputStream(inputFileName);

        // Get the object of DataInputStream
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader( new InputStreamReader( new FileInputStream(inputFileName), "UTF-8"));
        String strLine;
        //Read File Line By Line
        while ((strLine = br.readLine()) != null)   {
            // Print the content on the console
            String[] values = strLine.split(";");
            if(values.length == 2) {
                language.putTranslationItem(values[0], values[1]);
            }
    }

      //Close the input stream
    in.close();

    } catch (FileNotFoundException e) {
    } catch (IOException e) {
    }

    return language;
}

I hope anybody can help out! 希望任何人都能帮忙!

Thanks 谢谢

I am not completely sure about this , but you can try to convert the values[0] and values[1] strings into bytearray 我对此不太确定,但是您可以尝试将values [0]和values [1]字符串转换为bytearray

byte[] value_0_utfString = values[0].getBytes("UTF-8") ;
byte[] value_1_utfString = values[1].getBytes("UTF-8") ;

and then convert it back into a string 然后将其转换回字符串

str_0 = new String(value_0_utfString ,"UTF-8") ;
str_1 = new String(value_1_utfString ,"UTF-8") ;

Not sure if this is the right / optimized way , but since a single line comprises of both english and french , I thought splitting and encoding might help , I haven't tried this myself 不确定这是否是正确的/优化的方式,但是由于一行由英语和法语组成,我认为拆分和编码可能会有所帮助,我自己也没有尝试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM