[英]For a given list of string count no of words
I have a given list of string which is:我有一个给定的字符串列表,它是:
strings = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems", "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]
Now I want to find the no of word in each sentence so that my output will be现在我想在每个句子中找到单词的编号,这样我的 output 将是
{'the': 2, 'method': 1, 'of': 1, 'lagrange': 1, 'multipliers': 1, 'is': 1, 'economists': 1, 'workhorse': 1, 'for': 1, 'solving': 1, 'optimization': 1, 'problems': 1}
{'the': 1, 'technique': 1, 'is': 1, 'a': 1, 'centerpiece': 1, 'of': 1, 'economic': 1, 'theory': 1, 'but': 1, 'unfortunately': 1, 'its': 1, 'usually': 1, 'taught': 1, 'poorly': 1}
My code is as below:我的代码如下:
from collections import Counter
dataset = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
"the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]
for index,row in enumerate(dataset):
word_frequency = dict(Counter(row.split(" ")))
print(word_frequency)
With this i am getting output which is:有了这个我得到 output 这是:
{'the': 1, 'technique': 1, 'is': 1, 'a': 1, 'centerpiece': 1, 'of': 1, 'economic': 1, 'theory': 1, 'but': 1, 'unfortunately': 1, 'its': 1, 'usually': 1, 'taught': 1, 'poorly': 1}
Clearly it's only considering the second sentence and counting it but not the first one.显然,它只考虑第二句并计算它,而不是第一句。
Can anyone help me understand what is wrong in my code?谁能帮我理解我的代码有什么问题?
The word_frequency
is being updated with every string in the dataset list.In the end, it is storing the Counter for last string in the dataset.Hence, displaying the the Counter for words in the last string. word_frequency
正在使用数据集列表中的每个字符串进行更新。最后,它存储数据集中最后一个字符串的计数器。因此,显示最后一个字符串中单词的计数器。 You can use print(word_frequency)
inside the for loop or use a list
and append the word_frequency
to the list each time and once you are out of the loop just print the list
.您可以在 for 循环中使用
print(word_frequency)
或使用list
和 append 每次将word_frequency
放到列表中,一旦退出循环,只需打印list
。
from collections import Counter
dataset = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
"the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]
l = []
for index,row in enumerate(dataset):
word_frequency = dict(Counter(row.split(" ")))
l.append(word_frequency)
print(l)
Just move the print to be inside your for
loop as you're overwriting your calculated word_frequency parameter.当您覆盖计算的 word_frequency 参数时,只需将打印移动到您的
for
循环内。
Print inside the loop instead of setting a variable inside the loop (which will get overwritten on each iteration before you print it at the end):在循环内打印,而不是在循环内设置变量(在最后打印之前,它将在每次迭代中被覆盖):
>>> dataset = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
... "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]
>>> from collections import Counter
>>> for sentence in dataset:
... print(dict(Counter(sentence.split())))
...
{'the': 2, 'method': 1, 'of': 1, 'lagrange': 1, 'multipliers': 1, 'is': 1, 'economists': 1, 'workhorse': 1, 'for': 1, 'solving': 1, 'optimization': 1, 'problems': 1}
{'the': 1, 'technique': 1, 'is': 1, 'a': 1, 'centerpiece': 1, 'of': 1, 'economic': 1, 'theory': 1, 'but': 1, 'unfortunately': 1, 'its': 1, 'usually': 1, 'taught': 1, 'poorly': 1}
You're overwriting the word_frequency
variable inside the for
loop, meaning only the final value from the final count is printed.您正在覆盖
for
循环内的word_frequency
变量,这意味着仅打印最终计数的最终值。
You should instead define a Counter
outside the for loop, add to the values of that using Counter's add functionality within the for loop, then convert to dict and print at the end:您应该改为在 for 循环之外定义一个
Counter
,在 for 循环中使用 Counter 的add功能添加该值,然后转换为 dict 并在最后打印:
from collections import Counter
dataset = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
"the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]
cumulative_counter = Counter()
for index,row in enumerate(dataset):
cumulative_counter += Counter(row.split(" "))
word_frequency = dict(cumulative_counter)
print(word_frequency)
Outputs:输出:
{'the': 3, 'method': 1, 'of': 2, 'lagrange': 1, 'multipliers': 1, 'is': 2, 'economists': 1, 'workhorse': 1, 'for': 1, 'solving': 1, 'optimization': 1, 'problems': 1, 'technique': 1, 'a': 1, 'centerpiece': 1, 'economic': 1, 'theory': 1, 'but': 1, 'unfortunately': 1, 'its': 1, 'usually': 1, 'taught': 1, 'poorly': 1}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.