简体   繁体   中英

java string with English and Chinese text

I have a runtime string that could contain English text with Chinese or Japanese text. eg John (漢字). I wanted to parse this text and extract non English characters.

indexOf brackets returns -1. Could anyone point me to right direction?

String str = "John (漢字)";
int startIndex = str.indexOf("(");
int endIndex = str.indexOf(")");

it runs well when I try your code~ 在此处输入图片说明

but it return -1 means it doesn't have the symbol in the string ,please check again . you can transform the symbol to int and compared!

When I run your code with a couple of System.out.println statements added:

public class CJKText {
    public static void main(String[] args) {
        String str = "John (漢字)";
        int startIndex = str.indexOf("(");
        System.out.println("startIndex: " + startIndex);
        int endIndex = str.indexOf(")");
        System.out.println("endIndex: " + endIndex);
    }
}

the output is:

startIndex: 5
endIndex: 8

Please verify that the code posted is the code you are examining in your debugger - perhaps as a number of commenters have said your actual code contains characters that look like Unicode 0x28 and 0x29 parentheses but which are in fact not those character codes.

If you only need to extract the Kanji/Hanzi part, should try something like this:

System.out.println( str.replaceAll("\\P{IsHan}+",""));

Oops!

This would not help, if your bracets are also in the Han script...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM