简体   繁体   中英

to read unicode character in java

i am trying to read Unicode characters from a text file saved in utf-8 using java my text file is as follows

अ, अदेबानि ,अन, अनसुला, अनसुलि, अनफावरि, अनजालु, अनद्ला, अमा, अर, अरगा, अरगे, अरन, अराय, अलखद, असे, अहा, अहिंसा, अग्रं, अन्थाइ, अफ्रि, बियन, खियन, फियन, बन, गन, थन, हर, हम, जम, गल, गथ, दरसे, दरनै, थनै, थथाम, सथाम, खफ, गल, गथ, मिख, जथ, जाथ, थाथ, दद, देख, न, नेथ, बर, बुंथ, बिथ, बिख, बेल, मम, आ, आइ, आउ, आगदा, आगसिर

i have tried with the code as followed

import java.io.*;
import java.util.*;
import java.lang.*;
public class UcharRead
{
    public static void main(String args[])
    {
        try
        {
            String str;
            BufferedReader bufReader = new BufferedReader( new InputStreamReader(new FileInputStream("research_words.txt"), "UTF-8"));
            while((str=bufReader.readLine())!=null)
            {
                System.out.println(str);
            }
        }
        catch(Exception e)
        {
        }
    }
}

getting out put as ???????????????????????? can anyone help me

You are (most likely) reading the text correctly, but when you write it out, you also need to enable UTF-8. Otherwise every character that cannot be printed in your default encoding will be turned into question marks.

Try writing it to a File instead of System.out (and specify the proper encoding):

Writer w = new OutputStreamWriter(
   new FileOutputStream("x.txt"), "UTF-8");

If you are reading the text properly using UTF-8 encoding then make sure that your console also supports UTF-8. In case you are using eclipse then you can enable UTF-8 encoding foryour console by:

Run Configuration->Common -> Encoding -> Select UTF 8

Here is the eclipse screenshot.

在此输入图像描述

You're reading it correctly - the problem is almost certainly just that your console can't handle the text. The simplest way to verify this is to print out each char within the string. For example:

public static void dumpString(String text) {
    for (int i = 0; i < text.length(); i++) {
        char c = text.charAt(i);
        System.out.printf("%c - %04x\n", c, (int) c);
    }
}

You can then verify that each character is correct using the Unicode code charts .

Once you've verified that you're reading the file correctly, you can then work on the output side of things - but it's important to try to focus on one side of it at a time. Trying to diagnose potential failures in both input and output encodings at the same time is very hard.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM