Java : Skip Unicode characters while reading a file

Question

I am reading a text file using the below code,

try (BufferedReader br = new BufferedReader(new FileReader(<file.txt>))) {
   for (String line; (line = br.readLine()) != null;) {
      //I want to skip a line with unicode character and continue next line
      if(line.toLowerCase().startsWith("\\u")){
            continue;
         //This is not working because i get the character itself and not the text 
        }
    }
}

The text file :

How to skip all the unicode characters while reading a file ?

Answer 1

You can skip all lines that contains non ASCII characters:

if(Charset.forName("US-ASCII").newEncoder().canEncode(line)){
    continue; 
}

Answer 2

All characters in a String are Unicode. A String is a counted sequence of UTF-16 code units. By "Unicode", you must mean not also in some unspecified set of other character sets. For sake of argument, let's say ASCII.

A regular expression can sometimes be the simplest expression of a pattern requirement:

if (!line.matches("\\p{ASCII}*")) continue;

That is, if the string does not consist only of any number, including 0, (that's what * means) of "ASCII" characters, then continue.

( String.matches looks for a match on the whole string, so the actual regular expression pattern is ^\\p{ASCII}*$ . )

Answer 3

Something like this might get you going:

for (char c : line.toCharArray()) {
    if (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.BASIC_LATIN) {
        // do something with this character
    }
}

You could use that as a starting point to either discard each non-basic character, or discard the entire line if it contains a single non-basic character.

Java : Skip Unicode characters while reading a file

Question

3 answers

solution1
0 2019-07-11 10:54:34

solution2
0 2019-07-11 21:59:25

solution3
0 2019-07-12 03:43:51

Java : Skip Unicode characters while reading a file

Question

3 answers

solution1 0 2019-07-11 10:54:34

solution2 0 2019-07-11 21:59:25

solution3 0 2019-07-12 03:43:51

solution1
0 2019-07-11 10:54:34

solution2
0 2019-07-11 21:59:25

solution3
0 2019-07-12 03:43:51