如何從java中的文本文件/文件夾中獲取字數（不更改文件夾中的讀取順序）

Question

在我的下面的代碼中，它從文件夾中讀取.txt文件（比如該文件夾有2000多個文本文件），並顯示文本文檔中存在的單詞總數。

如果我只從目錄中讀取10-30個文本文件，則輸出正確顯示每個文本文件的順序。

但是當我添加2000多個文本文件並從該文件夾中一次讀取時，輸出排列將被折疊。（它以隨機順序顯示）。

誰能建議我解決這個問題？

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FilenameFilter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.StringReader;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.commons.io.FileUtils;

public class duplicatestrings
{
public static void main(String[] args) 
{
    FilenameFilter filter = new FilenameFilter() {
        public boolean accept(File dir, String name) {
            return name.endsWith(".txt");
        }
    };

    File folder = new File("E:\\testfolder");
    File[] listOfFiles = folder.listFiles(filter);

    for (int i = 0; i < listOfFiles.length; i++) {
        File file1 = listOfFiles[i];
        try {
            String content = FileUtils.readFileToString(file1);
             // System.out.println("asssdffsssssssssss = " + content);
        } catch (IOException e) {

            e.printStackTrace();
        }

        BufferedReader ins = null;
        try {
            ins = new BufferedReader (
                    new InputStreamReader(
                        new FileInputStream(file1)));
        } catch (FileNotFoundException e) {

            e.printStackTrace();
        }

        String line = "", str = "";

        int a = 0;
        int b = 0;
        try {
            while ((line = ins.readLine()) != null) {
            str += line + " ";
            b++;
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
     //   System.out.println("Total number of lines " +b);

     //System.out.println(str);

    /*    int count =0;
        try {
            String input = ins.readLine();
            String[] array = input.split(" ");
            System.out.print("\nPlease enter word to be counted :");
            String key = ins.readLine();
            for(int s=0;i < array.length;i++){
                if(array[s].equals(key))
                    count++;
            }
            System.out.print("\n The given word occured " + count + " times");
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }*/





        StringTokenizer st = new StringTokenizer(str);
        while (st.hasMoreTokens()) {
        String s = st.nextToken();
        a++;

        }

 // List<String> list = Arrays.asList(str.split(" "));

      //  Set<String> uniqueWords = new HashSet<String>(list);
       // for (String word : uniqueWords) {
        //    System.out.println(word + a+ "\n"  + Collections.frequency(list, word));}
           System.out.println(" Total no of words=" + a );


    }
        }
      }

而且我必須從所有文本文件/文件夾（目錄）中獲得不同且重復的單詞“no of counts（only）”。

建議歡迎。

Answer 1

計算每個文件中的單詞后，可以將結果插入到TreeSet中，然后可以按順序顯示它們。 關鍵是文件名，值是字數。 請參閱：如何在Java中按鍵對Map值進行排序

或者您可以對文件夾中的文件名進行排序，並計算排序文件列表中的單詞：如何按字母順序File.listFiles？

Answer 2

我想下面的邏輯將幫助你，添加文件讀取代碼並將“test”變量替換為文件中的每一行。

計算總字數或計算總字數而不重復字數

   public static void main(String[] args) {
    String test = "I am trying to make make make";
    Pattern p = Pattern.compile("\\w+");
    Matcher m = p.matcher(test);
    HashSet<String> hs =  new HashSet<>();
    int i=0;
    while (m.find()) {
        i++;
        hs.add(m.group());
    }
    System.out.println("Total words Count==" + i);
    System.out.println("Count without Repetation ==" + hs.size());
    }

輸出：

總字數== 7

沒有重復的計數== 5

希望這可以幫助：）

如何從java中的文本文件/文件夾中獲取字數（不更改文件夾中的讀取順序）

問題描述

2 個解決方案

解決方案1
0 2016-01-18 07:41:34

解決方案2
0 2016-09-26 11:49:48

如何從java中的文本文件/文件夾中獲取字數（不更改文件夾中的讀取順序）

問題描述

2 個解決方案

解決方案1 0 2016-01-18 07:41:34

解決方案2 0 2016-09-26 11:49:48

解決方案1
0 2016-01-18 07:41:34

解決方案2
0 2016-09-26 11:49:48