I want to create a dictionary of all unique words in the text. The key is the word and the value is the word's frequency
dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home']
word_listT = str(' '.join(dtt)).split()
wordsT = {v:k for (k, v) in enumerate(word_listT)}
print wordsT
I expect something like this:
{'we': 2, 'is': 1, 'peace': 2, 'at': 2, 'want': 2, 'our': 3, 'home': 4, 'you': 1, 'went': 1, 'nice': 1}
However, I receive this:
{'we': 14, 'is': 12, 'peace': 16, 'at': 17, 'want': 15, 'our': 10, 'home': 18, 'you': 0, 'went': 7, 'nice': 13}
Apparently, I am misusing the functionality or doing something wrong.
Please, help
The problem with what you are doing is you are storing the array index of where the word is instead of a count of those words.
To achieve this you can just use collections.Counter
from collections import Counter
dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home']
counted_words = Counter(' '.join(dtt).split())
# if you want to see what the counted words are you can print it
print counted_words
>>> Counter({'home': 4, 'our': 3, 'we': 2, 'peace': 2, 'at': 2, 'want': 2, 'is': 1, 'you': 1, 'went': 1, 'nice': 1})
SOME CLEANUP: as mentioned in the comments
str()
is unnecessary for your ' '.join(dtt).split()
You can also remove the list assignment and do your counter on the same line
Counter(' '.join(dtt).split())
A little more detail about your list indices; first you have to understand what your code is doing.
dtt = [
'you want home at our peace',
'we went our home',
'our home is nice',
'we want peace at home'
]
Notice you have 19 words here; print len(word_listT)
returns 19. Now on the next line word_listT = str(' '.join(dtt)).split()
you are making a list of all of the words, which looks like this
word_listT = [
'you',
'want',
'home',
'at',
'our',
'peace',
'we',
'went',
'our',
'home',
'our',
'home',
'is',
'nice',
'we',
'want',
'peace',
'at',
'home'
]
Count them again: 19 words. The very last word is 'home'. And list indices start at 0 so 0 to 18 = 19 elements. yourlist[18]
is 'home'. This has nothing to do with the string location or anything, just the index of your new array. :)
Try this:
from collections import defaultdict
dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home']
word_list = str(' '.join(dtt)).split()
d = defaultdict(int)
for word in word_list:
d[word] += 1
enumerate
returns a list of words with their indices, not with their frequency. That is, when you create the wordsT dictionary, each v
is actually the index in word_listT of the last instance of k
. To do what you want, using a for-loop is probably the most straightforward.
wordsT = {}
for word in word_listT:
try:
wordsT[word]+=1
except KeyError:
wordsT[word] = 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.