[英]Pattern matching with Chinese characters (encoded in UTF-8) in Java
I need to check whether a Chinese province is contained within an address in Chinese. 我需要检查一个中国省份是否包含在中文地址中。
I am able to read and write Chinese characters easily. 我能够轻松地阅读和书写汉字。
I tried to use the indexOf() method of String to check whether a province (eg 广东) is contained within an address (中国 广东). 我尝试使用String的indexOf()方法来检查一个省(例如广东)是否包含在一个地址中(中国广东)。 However, this always returns -1.
但是,这总是返回-1。
When I try to check for numbers (eg whether 103 is contained within 9910399) it works fine. 当我尝试检查数字时(例如99是否包含在9910399中)它可以正常工作。
Do I need to do something different to handle UTF-8 string matching? 我是否需要做一些不同的事情来处理UTF-8字符串匹配? Thanks.
谢谢。 Matt
马特
I have just tried your example and although I do not have Chineese fonts on my system, so the characters are not displayed correctly indexOf() works fine for me. 我刚试过你的例子,虽然我的系统上没有Chineese字体,但字符显示不正确indexOf()对我来说很好。
So, check encoding of your source files (*.java). 因此,请检查源文件(* .java)的编码。 For example if you are using eclipse check it under Window/Preferences/General/Workspace/Text file Encoding.
例如,如果您使用eclipse,请在Window / Preferences / General / Workspace / Text file Encoding下检查它。 I am using UTF-8.
我使用的是UTF-8。
The second think is the encoding used by java compiler. 第二个想法是java编译器使用的编码。 In case of eclipse you do not have to say anything.
在日食的情况下,你不必说什么。 I think that for
javac
you probably should explicitely set encoding using -encoding
. 我认为对于
javac
您可能应该使用-encoding明确地设置-encoding
。 Otherwise the default OS encoding will be probably used. 否则,可能会使用默认的OS编码。
Good luck. 祝好运。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.