[英]check if word contains a number or special character
I am writing a program to count the total number of valid English words in a text file. 我正在编写一个程序来计算文本文件中有效英语单词的总数。 In this code, I want to ignore words that contain number/numbers or special characters eg "word123", "123word ", "word&&", "$name".
在此代码中,我想忽略包含数字/数字或特殊字符的单词,例如“ word123”,“ 123word”,“ word &&”,“ $ name”。 Currently my program detects words that start with numbers eg "123number".
目前,我的程序检测到以数字开头的单词,例如“ 123number”。 However cannot detect "number123".
但是无法检测到“ number123”。 Can anyone tell me how should I move forward ?
谁能告诉我我应该如何前进? Below is my code:
下面是我的代码:
public int wordCounter(String filePath) throws FileNotFoundException{
File f = new File(filePath);
Scanner scanner = new Scanner(f);
int nonWord = 0;
int count = 0;
String regex = "[a-zA-Z].*";
while(scanner.hasNext()){
String word = scanner.next();
if(word.matches(regex)){
count++;
}
else{
nonWord++;
}
}
return count;
}
Lose the dot: 丢点:
String regex = "[a-zA-Z]*"; // more correctly "[a-zA-Z]+", but both will work here
The dot means "any character", but you want a regex that means "only composed of letters". 点表示“任何字符”,但是您需要一个正则表达式,表示“仅由字母组成”。
BTW, you can also express this more succinctly (although arguably less readably) using a POSIX expression: 顺便说一句,您还可以使用POSIX表达式更简洁地表达(尽管可能不太可读):
String regex = "\\p{L}+";
The regex \\p{L}
means "any letter". 正则表达式
\\p{L}
表示“任何字母”。
To extend the expression to include the apostrophe, which can appear at the start, eg 'tis
, the middle eg can't
or the end eg Jesus'
, but not more than once: 为了将表达式扩展为包括撇号,该撇号可以出现在开始处,例如
'tis
,中间例如can't
或者结束处可以出现,例如Jesus'
,但不能超过一次:
String regex = "(?!([^']*'){2})['\\p{L}]+";
Use regex ^[a-zA-Z-]+$ for word match. 使用正则表达式^ [a-zA-Z-] + $进行单词匹配。
public int wordCounter(String filePath) throws FileNotFoundException
{
File f = new File(filePath);
Scanner scanner = new Scanner(f);
int nonWord = 0;
int count = 0;
String regex = "^[a-zA-Z-]+$";
while(scanner.hasNext()){
String word = scanner.next();
if(word.matches(regex)){
count++;
}
else{
nonWord++;
}
}
return count;
} }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.