[英]Java : Skip Unicode characters while reading a file
I am reading a text file using the below code, 我正在使用以下代码读取文本文件,
try (BufferedReader br = new BufferedReader(new FileReader(<file.txt>))) {
for (String line; (line = br.readLine()) != null;) {
//I want to skip a line with unicode character and continue next line
if(line.toLowerCase().startsWith("\\u")){
continue;
//This is not working because i get the character itself and not the text
}
}
}
The text file : 文本文件 :
How to skip all the unicode characters while reading a file ? 如何在读取文件时跳过所有unicode字符?
You can skip all lines that contains non ASCII characters: 您可以跳过所有包含非ASCII字符的行:
if(Charset.forName("US-ASCII").newEncoder().canEncode(line)){
continue;
}
All characters in a String are Unicode. 字符串中的所有字符均为Unicode。 A String is a counted sequence of UTF-16 code units.
字符串是UTF-16代码单元的计数序列。 By "Unicode", you must mean not also in some unspecified set of other character sets.
用“ Unicode”表示,您还必须在其他未指定的其他字符集中也不要使用。 For sake of argument, let's say ASCII.
为了便于讨论,我们假设使用ASCII。
A regular expression can sometimes be the simplest expression of a pattern requirement: 正则表达式有时可能是模式需求的最简单表达:
if (!line.matches("\\p{ASCII}*")) continue;
That is, if the string does not consist only of any number, including 0, (that's what *
means) of "ASCII" characters, then continue. 也就是说,如果字符串不是仅由任何数字组成,包括0(即
*
意思是“ ASCII”字符),则继续。
( String.matches
looks for a match on the whole string, so the actual regular expression pattern is ^\\p{ASCII}*$
. ) (
String.matches
在整个字符串上寻找匹配项,因此实际的正则表达式模式为^\\p{ASCII}*$
。)
Something like this might get you going: 这样的事情可能会让您前进:
for (char c : line.toCharArray()) {
if (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.BASIC_LATIN) {
// do something with this character
}
}
You could use that as a starting point to either discard each non-basic character, or discard the entire line if it contains a single non-basic character. 您可以以此为起点来丢弃每个非基本字符,或者如果整行包含一个非基本字符,则丢弃整行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.