简体   繁体   中英

Java: Pinyin special characters not outputting

import java.io.*;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;

public class TextReader {

public static void main(String[] args) throws FileNotFoundException{ 

    //  HashMap<String, Integer> hashmap = new HashMap <String, Integer>();
        TreeMap<String, Integer> hashmap = new TreeMap<String, Integer>();
    //get the file and put it into the file variable    
        File file = new File ("/Desktop/TextSampleWordCount.txt");
    //Scan the file in  
        Scanner pinyintextfile = new Scanner (file,"UTF-8");



         while(pinyintextfile.hasNext()){
             String word = pinyintextfile.next();
             if( hashmap.containsKey(word)){
                 //if the word is found we put the word into the map and update its count. 
                 int count = hashmap.get(word) + 1;
                 hashmap.put(word, count);

             }
             else{
                 //if the word in not in the map we want to create a new entry for it 
                 hashmap.put(word, 1);

             }

         }

             pinyintextfile.close();
             for(Map.Entry<String, Integer> entry : hashmap.entrySet()){
                 System.out.println(entry);
             }  
    }
}

The program counts the number of repetitions of pinyin words. The problem is that when it outputs the text it outputs it as

Ch?ng =1

Zh?= 3

etc.... I tried looking up the problem but nothing helped. I also referred to this https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html and changed the charsetName but still outputs questions marks. I am not sure what im doing wrong. Could it be my IDE ?

The file looks like this

Yù kè sù dǎo zhú zhǐ bì shè mù qiú zhēng zì běn yáng qī yán biǎo. Dú zhǒng lǎn yǐ wén yǒu zhèng cái shū cān sè luò shè láo zì yū xuě qián mù wàn. Yū bàn shè shí lǐ wài gèng ér jiāo xī qì shàn xiāng xiào. Wén sēn dé yì fā hù luòzhuǎn quán dào nián měi jì shì chūguò gé shū. Tài jué zhī néngshǒu sòng xiě qiú xù tū tóu jī shòu wèi zhì diào tú yù ān néng. Zhì fù qǐ jiè xíngshì jué zhǐ dǒng zhǔ sè shí yì jì. Dú shè hǎo rì jì zhì qì shǒu xué jí jūn yè zhì shè chēzuò xī zhōngyán míng. Tè shēng yì zhōng shè tóu néng gōng chūshān zuò shēn yàn. Lì fàn duō quán mǎ huà zhèng jì zhì kāngdìng wèn yǒng zǒng.

It is most likely the configuration of your Eclipse's console settings. Check Run > Run Configurations > Common (tab) > Encoding and see if the default encoding is "UTF-8". If not, please select "UTF-8" as default encoding and run your program again.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM