java string with English and Chinese text

Question

I have a runtime string that could contain English text with Chinese or Japanese text. eg John (漢字). I wanted to parse this text and extract non English characters.

indexOf brackets returns -1. Could anyone point me to right direction?

String str = "John (漢字)";
int startIndex = str.indexOf("(");
int endIndex = str.indexOf(")");

Answer 1

it runs well when I try your code~

but it return -1 means it doesn't have the symbol in the string ,please check again . you can transform the symbol to int and compared!

Answer 2

When I run your code with a couple of System.out.println statements added:

public class CJKText {
    public static void main(String[] args) {
        String str = "John (漢字)";
        int startIndex = str.indexOf("(");
        System.out.println("startIndex: " + startIndex);
        int endIndex = str.indexOf(")");
        System.out.println("endIndex: " + endIndex);
    }
}

the output is:

startIndex: 5
endIndex: 8

Please verify that the code posted is the code you are examining in your debugger - perhaps as a number of commenters have said your actual code contains characters that look like Unicode 0x28 and 0x29 parentheses but which are in fact not those character codes.

Answer 3

If you only need to extract the Kanji/Hanzi part, should try something like this:

System.out.println( str.replaceAll("\\P{IsHan}+",""));

Oops!

This would not help, if your bracets are also in the Han script...

java string with English and Chinese text

Question

3 answers

solution1
2 ACCPTED 2017-10-20 02:55:14

solution2
1 2017-10-20 02:58:44

solution3
1 2017-10-20 03:11:21

Oops!

java string with English and Chinese text

Question

3 answers

solution1 2 ACCPTED 2017-10-20 02:55:14

solution2 1 2017-10-20 02:58:44

solution3 1 2017-10-20 03:11:21

Oops!

solution1
2 ACCPTED 2017-10-20 02:55:14

solution2
1 2017-10-20 02:58:44

solution3
1 2017-10-20 03:11:21