简体   繁体   中英

count amount of unique words in a text file? (Not allowed to use Hash)

How can I count repeated words in a text file, using an array?

My program is able to print out total words in the file, But how can I get my program to print the number of different words and also have printed out a list of the number of the repeated words like this:

Cake: 4 a: 320 Piece: 2 of 24

(Words with capital letters and small letters are considered the same word)

void FileReader() { 


    System.out.println("Oppgave A");
    int totalWords = 0; 
    int uniqueWords = 0; 
    String [] word = new String[35000];
    String [] wordC = new String [3500];
    try {
        File fr = new File("Alice.txt");
        Scanner sc = new Scanner (fr);

        while(sc.hasNext()){
        String words = sc.next();
        String[] space = words.split(" ");
        String[] comma = words.split(",");
            totalWords++;


            }
        System.out.println("Antall ord som er lest er: " + totalWords);         
    } catch (Exception e) {

        System.out.println("File not found");

    }

That would be very ineficient with array, because after each word you would have to iterate through the array to see if the word occured already. Instead use HashMap where key is the word and value is the number of occurencies. It's easier and faster to see if HashMap contains a key than to see if array contains an element.

EDIT:

HashMap<String, Integer>

Try using a set, and checking the return value using iteration.

Set<String> set = new HashSet(Arrays.asList(word));
int unique = 0;
for (String temp : word) {
    if (set.add(temp)) {
        unique++;
    }
}

//or...
Set<String> set = new HashSet(Arrays.asList(word));
int unique = set.size();

This is of course after having all values imported already.

Edit: Seeing you can't use Maps (and assuming other data structures), you might have to do the somewhat gross way of checking every value.

//get a new word from the text file
boolean isUnique = true;
//for every word in your array; input == your new word
    if (word.equalsIgnoreCase(input)) {
        unique = false
    }
//end loop
if (isUnique) {
    unique++; // Assuming unique is the count of unique words
}

您可以在每次添加地图中已有的单词时使用地图,以增加值(计数)

Every time you are adding a word you need to check if the word already exists in your array. To compare you will need to use:

 word1.equalsIgnoreCase(word2);

Try this:

 try {
            List<String> list = new ArrayList<String>();
            int totalWords = 0;
            int uniqueWords = 0;
            File fr = new File("Alice.txt");
            Scanner sc = new Scanner(fr);
            while (sc.hasNext()) {
                String words = sc.next();
                String[] space = words.split(" ");
                for (int i = 0; i < space.length; i++) {
                    list.add(space[i]);
                }
                totalWords++;
            }
            System.out.println("Words with their frequency..");
            Set<String> uniqueSet = new HashSet<String>(list);
            for (String word : uniqueSet) {
                System.out.println(word + ": " + Collections.frequency(list,word));
            }
        } catch (Exception e) {

            System.out.println("File not found");

        }

You can improve on simple array searching using Arrays.sort and Arrays.binarySearch .

Essentially, for each word, check if it is already in your array with binarySearch . If it is, increment your count. If it is not, add it to the array and sort again. The current Java sort algorithm is very fast when the array is already mostly sorted. It uses TimSort .

There are other structures such as TreeSet you could use to avoid using hashing but I suspect that would also be disallowed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM