I want to determine whether the last character in the buffer defined as the bytes between begin and end is English or Japanese. I read about uTF-8 where Japanese characters are two bytes long and always have 1 in the high bit of the high byte, whereas low byte can have either 1 or 0 in the high bit.
I am trying to return integer 2 for Japanese(2Bytes), 1 for English and 0 for data in buffer is malformed.
public static int NumChars(byte begin, byte end). Can you point me to the right direction? I am confused how to approach this. I was thinking about using xor to find if the MSB in high bit is 1 then return 2, but I have a doubt even if I understood correctly.
Jeevan UTF-8 character byte length can be between 1 to 4 bytes.
so If you want to print 2 for Japanese characters please use this unicode.
Example:--
String j = "大";
System.out.println(j.getBytes("SJIS").length);
There is a discussion about this on this thread guessing-the-encoding-of-text-represented-as-byte-in-java
If you can get the buffer or part of it in string form. Then you can use regular expressions to match the character sets like this:
String english = ".*[\\x{20}-\\x{7E}]$";
String hiragana = ".*[\\x{3041}-\\x{3096}]$";
byte[] buffer = {97, 98, 99, -29, -127, -126}; //"abcあ"
System.out.println("buffer: "+Arrays.toString(buffer));
String s = new String(buffer,"utf-8") ;
System.out.println(s + " is hiragana=" + s.matches(hiragana));
System.out.println(s + " is english=" + s.matches(english));
s = "abcd";
System.out.println(s + " is hiragana=" + s.matches(hiragana));
System.out.println(s + " is english=" + s.matches(english));
Output:
buffer: [97, 98, 99, -29, -127, -126]
abcあ is hiragana=true
abcあ is english=false
abcd is hiragana=false
abcd is english=true
You will have to find out which Japanese character sets your program is using like Kenji, Hiragana, Katakana etc. For more information read this article: regular-expressions-for-japanese-text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.