简体   繁体   中英

Word Frequency Counter issue with logic java

I am building a basic word frequency counter. The code is listed below:

public static List<Frequency> computeWordFrequencies(List<String> words) 
{
    List<Frequency> list_of_frequency = new ArrayList<Frequency>();
    List<String> list_of_words = words;
    int j = 0;
    for(int i=0; i<list_of_words.size(); i++)
    {

        String current_word = list_of_words.get(i);
        boolean added = false;
        if(list_of_frequency.size() == 0)
        {
            list_of_frequency.add(new Frequency(current_word, 1));
            System.out.println("added " + current_word);
        }
        else
        {

            System.out.println("Current word: " + current_word);
            System.out.println("Current Frequency: " + list_of_frequency.get(j).getText());
            if(list_of_frequency.contains(current_word))
            {
                list_of_frequency.get(j).incrementFrequency();
                System.out.println("found... incremented " + list_of_frequency.get(j).getText() + " frequency");
                added = true;
            }
            else
            {
                list_of_frequency.add(new Frequency(current_word, 1));
                System.out.println("added " + current_word);
                added = true;
            }
        }
    }
}

and the output I am getting is:

added I
Current word: am
Current Frequency: I
added am
Current word: very
Current Frequency: I
added very
Current word: good
Current Frequency: I
added good
Current word: at
Current Frequency: I
added at
Current word: being
Current Frequency: I
added being
Current word: good
Current Frequency: I
added good
Total item count: 7
Unique item count: 7
I:1
am:1
very:1
good:1
at:1
being:1
good:1

So I need a for loop to loop through the "list_of_frequency" but if I do that I run into other problems such as adding words repetitively. Is my logic right here and would there be a better way going about this project? Thanks in advance!

you can do this using frequency method of Collections class

here is a sample:

public void wordFreq(){
String text = "hello bye hello a bb a bye hello";

        List<String> list = Arrays.asList(text.split(" "));

        Set<String> uniqueWords = new HashSet<String> (list);
        for (String word : uniqueWords) {
            System.out.println(word + ": " + Collections.frequency(list, word));
        }
}

You are over-complicating things.

You only need a few lines:

public static Map<String, Integer> getFrequencies(List<String> words) {
    Map<String, Integer> freq = new HashMap<String, Integer>();
    for (String word : words) {
        Integer i = freq.get(word);
        freq.put(word, i == null ? 1 : i + 1);
    }
    return freq;
}

Add this code inside else part. What you should do is

  1. in a loop check if the word is already in the list
  2. If part 1 is true just increment its frequency
  3. else put it in the list with frequency 1

     for(j = 0; j < list_of_frequency.size; j++) if(list_of_frequency.get(i).getText().equals(current_word)) list_of_frequency.get(i).frequency++; // increment frequency //if word is already encountered before 

I think to run faster you should use another algorithm starting by sorting the List:

  1) sort your list of string (cf. java.util.Collections.sort())
  2) in pseudo code :
 iterate your sorted list
 current_word = word of current iteration
 if it's a new word (! current_word.equals( oldWord) )
 counter = 1
 if (current_word.equals( oldWord)) {
    counter++
     store current_word in variable oldWord 
 }
 when the word change create your Frequency(oldWord, counter) and add to the list of frequencies

So you don't need to check your frequency list every time and you insert one time for one word, it's quicker.

Since all entries of list list_of_frequency are unique words you could also use a Set instead of a list for list_of_frequency.

Replace your method with this. You'll get a lot better performance by using a map while analyzing your data.

public static List<Frequency> computeWordFrequencies(List<String> words) {
    Map<String, Integer> counts = new HashMap<String, Integer>();
    for(String word : words) {
        Integer current = counts.get(word);
        if(current != null) {
            counts.put(word, current+1);
        }
        else counts.put(word, 1);
    }

    // Then, if you really need that list of Frequency
    List<Frequency> list_of_frequency = new ArrayList<Frequency>();

    for(String s : counts.keySet()) {
        list_of_frequency.add(new Frequency(s, counts.get(s)));
    }

    return list_of_frequency;
}

I would proceed like this :

List<String> words = Arrays.asList("foo", "bar", "qux", "foo");

Map<String, AtomicInteger> frequencyMap = new HashMap<String, AtomicInteger>();
for (String word : words)
{
    AtomicInteger freq = frequencyMap.get(word);
    if (freq == null) {
        frequencyMap.put(word, new AtomicInteger(1));
    }
    else
    {
        freq.incrementAndGet();
    }
}

for (String word : frequencyMap.keySet())
{
    System.out.println(word + " :" + frequencyMap.get(word));
}

By using AtomicInteger you can easily increment your frequency counter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM