[英]Counting words from a text-file in Java
我正在编写一个程序,它将扫描一个文本文件,并计算其中的单词数。 分配的单词的定义是:'单词是仅由字母 (a,...,z,A,...,Z) 组成的非空字符串,由空格、标点符号、连字符包围,行开始,或行结束。 '。
我是java编程的新手,到目前为止我已经设法编写了这个instancemethod,它大概应该可以工作。 但事实并非如此。
public int wordCount() {
int countWord = 0;
String line = "";
try {
File file = new File("testtext01.txt");
Scanner input = new Scanner(file);
while (input.hasNext()) {
line = line + input.next()+" ";
input.next();
}
input.close();
String[] tokens = line.split("[^a-zA-Z]+");
for (int i=0; i<tokens.length; i++){
countWord++;
}
return countWord;
} catch (Exception ex) {
ex.printStackTrace();
}
return -1;
}
int wordCount = 0;
while (input.hasNextLine()){
String nextLine = input.nextLine();
Scanner word = new Scanner(nextline);
while(word.hasNext()){
wordCount++;
word.next();
}
word.close();
}
input.close();
文件中唯一可用的单词分隔符是空格和连字符。 您可以使用regex
和split()
方法。
int num_words = line.split("[\\s\\-]").length; //stores number of words
System.out.print("Number of words in file is "+num_words);
REGEX(正则表达式):
\\\\s
在空格/换行符和\\\\-
在连字符处拆分字符串。 因此,只要有空格、换行符或连字符,句子就会被拆分。 提取的单词被复制到一个数组中并返回,数组的length
是文件中的单词数。
you can use java regular expression.
You can read http://docs.oracle.com/javase/tutorial/essential/regex/groups.html to know about group
public int wordCount(){
String patternToMatch = "([a-zA-z]+)";
int countWord = 0;
try {
Pattern pattern = Pattern.compile(patternToMatch);
File file = new File("abc.txt");
Scanner sc = new Scanner(file);
while(sc.hasNextLine()){
Matcher matcher = pattern.matcher(sc.nextLine());
while(matcher.find()){
countWord++;
}
}
sc.close();
}catch(Exception e){
e.printStackTrace();
}
return countWord > 0 ? countWord : -1;
}
void run(String path)
throws Exception
{
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF-8")))
{
int result = 0;
while (true)
{
String line = reader.readLine();
if (line == null)
{
break;
}
result += countWords(line);
}
System.out.println("Words in text: " + result);
}
}
final Pattern pattern = Pattern.compile("[A-Za-z]+");
int countWords(String text)
{
Matcher matcher = pattern.matcher(text);
int result = 0;
while (matcher.find())
{
++result;
System.out.println("Matcher found [" + matcher.group() + "]");
}
System.out.println("Words in line: " + result);
return result;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.