简体   繁体   English

在java中读取unicode字符

[英]to read unicode character in java

i am trying to read Unicode characters from a text file saved in utf-8 using java my text file is as follows 我试图从使用java保存在utf-8中的文本文件中读取Unicode字符我的文本文件如下

अ, अदेबानि ,अन, अनसुला, अनसुलि, अनफावरि, अनजालु, अनद्ला, अमा, अर, अरगा, अरगे, अरन, अराय, अलखद, असे, अहा, अहिंसा, अग्रं, अन्थाइ, अफ्रि, बियन, खियन, फियन, बन, गन, थन, हर, हम, जम, गल, गथ, दरसे, दरनै, थनै, थथाम, सथाम, खफ, गल, गथ, मिख, जथ, जाथ, थाथ, दद, देख, न, नेथ, बर, बुंथ, बिथ, बिख, बेल, मम, आ, आइ, आउ, आगदा, आगसिर अ,अदेबानि,अन,अनसुला,अनसुलि,अनफावरि,अनजालु,अनद्ला,अमा,अर,अरगा,अरगे,अरन,अराय,अलखद,असे,अहा,अहिंसा,अग्रं,अन्थाइ,अफ्रि,बियन,खियन,फियन,बन, गन,थन,हर,हम,जम,गल,गथ,दरसे,दरनै,थनै,थथाम,सथाम,खफ,गल,गथ,मिख,जथ,जाथ,थाथ,दद,देख,न,नेथ,बर,बुंथ, बिथ,बिख,बेल,मम,आ,आइ,आउ,आगदा,आगसिर

i have tried with the code as followed 我已尝试使用如下代码

import java.io.*;
import java.util.*;
import java.lang.*;
public class UcharRead
{
    public static void main(String args[])
    {
        try
        {
            String str;
            BufferedReader bufReader = new BufferedReader( new InputStreamReader(new FileInputStream("research_words.txt"), "UTF-8"));
            while((str=bufReader.readLine())!=null)
            {
                System.out.println(str);
            }
        }
        catch(Exception e)
        {
        }
    }
}

getting out put as ???????????????????????? 出去作为?????????????????????? can anyone help me 谁能帮我

You are (most likely) reading the text correctly, but when you write it out, you also need to enable UTF-8. 您(很可能)正确阅读文本,但是当您将其写出时,您还需要启用UTF-8。 Otherwise every character that cannot be printed in your default encoding will be turned into question marks. 否则,无法以默认编码打印的每个字符都将变为问号。

Try writing it to a File instead of System.out (and specify the proper encoding): 尝试将其写入File而不是System.out(并指定正确的编码):

Writer w = new OutputStreamWriter(
   new FileOutputStream("x.txt"), "UTF-8");

If you are reading the text properly using UTF-8 encoding then make sure that your console also supports UTF-8. 如果您使用UTF-8编码正确阅读文本,请确保您的控制台也支持UTF-8。 In case you are using eclipse then you can enable UTF-8 encoding foryour console by: 如果您正在使用eclipse,那么您可以通过以下方式为您的控制台启用UTF-8编码:

Run Configuration->Common -> Encoding -> Select UTF 8

Here is the eclipse screenshot. 这是日食截图。

在此输入图像描述

You're reading it correctly - the problem is almost certainly just that your console can't handle the text. 你正确地阅读它 - 问题几乎肯定只是你的控制台无法处理文本。 The simplest way to verify this is to print out each char within the string. 验证这一点的最简单方法是打印出字符串中的每个char For example: 例如:

public static void dumpString(String text) {
    for (int i = 0; i < text.length(); i++) {
        char c = text.charAt(i);
        System.out.printf("%c - %04x\n", c, (int) c);
    }
}

You can then verify that each character is correct using the Unicode code charts . 然后,您可以使用Unicode代码图表验证每个字符是否正确。

Once you've verified that you're reading the file correctly, you can then work on the output side of things - but it's important to try to focus on one side of it at a time. 一旦你确认你正确地阅读了文件,你就可以在输出方面做事 - 但重要的是一次只关注它的一面。 Trying to diagnose potential failures in both input and output encodings at the same time is very hard. 尝试同时诊断输入输出编码中的潜在故障是非常困难的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM