簡體   English   中英

用Java計算文本文件中的單詞

[英]Counting words from a text-file in Java

我正在編寫一個程序,它將掃描一個文本文件,並計算其中的單詞數。 分配的單詞的定義是:'單詞是僅由字母 (a,...,z,A,...,Z) 組成的非空字符串,由空格、標點符號、連字符包圍,行開始,或行結束。 '。

我是java編程的新手,到目前為止我已經設法編寫了這個instancemethod,它大概應該可以工作。 但事實並非如此。

public int wordCount() {
    int countWord = 0;
    String line = "";
    try {
        File file = new File("testtext01.txt");
        Scanner input = new Scanner(file);

        while (input.hasNext()) {
            line = line + input.next()+" ";
            input.next();
        }
        input.close();
        String[] tokens = line.split("[^a-zA-Z]+");
        for (int i=0; i<tokens.length; i++){
            countWord++;
        }
        return countWord;

    } catch (Exception ex) {
        ex.printStackTrace();
    }
    return -1;
}

引用文本文件中的單詞計數?

    int wordCount = 0;

    while (input.hasNextLine()){

       String nextLine = input.nextLine();
       Scanner word = new Scanner(nextline);

       while(word.hasNext()){
          wordCount++;    
          word.next();
       }
       word.close();
    }
    input.close();

文件中唯一可用的單詞分隔符是空格和連字符。 您可以使用regexsplit()方法。

int num_words = line.split("[\\s\\-]").length; //stores number of words
System.out.print("Number of words in file is "+num_words);

REGEX(正則表達式):

\\\\s在空格/換行符和\\\\-在連字符處拆分字符串。 因此,只要有空格、換行符或連字符,句子就會被拆分。 提取的單詞被復制到一個數組中並返回,數組的length是文件中的單詞數。

you can use java regular expression. 
You can read http://docs.oracle.com/javase/tutorial/essential/regex/groups.html to know about group



    public int wordCount(){

        String patternToMatch = "([a-zA-z]+)";
        int countWord = 0;
        try {
        Pattern pattern =  Pattern.compile(patternToMatch);
        File file = new File("abc.txt");
        Scanner sc = new Scanner(file);
        while(sc.hasNextLine()){
            Matcher matcher = pattern.matcher(sc.nextLine());
             while(matcher.find()){
                 countWord++;
             }
        }
        sc.close();
}catch(Exception e){
          e.printStackTrace();
        }
        return countWord > 0 ? countWord : -1;
    }
void run(String path)
throws Exception
{
    try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF-8")))
    {
        int result = 0;

        while (true)
        {
            String line = reader.readLine();

            if (line == null)
            {
                break;
            }

            result += countWords(line);
        }

        System.out.println("Words in text: " + result);
    }
}

final Pattern pattern = Pattern.compile("[A-Za-z]+");

int countWords(String text)
{
    Matcher matcher = pattern.matcher(text);

    int result = 0;

    while (matcher.find())
    {
        ++result;

        System.out.println("Matcher found [" + matcher.group() + "]");
    }

    System.out.println("Words in line: " + result);

    return result;
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM