对于给定的字符串列表，计数单词数

Question

I have a given list of string which is:我有一个给定的字符串列表，它是：

strings = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems", "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]

Now I want to find the no of word in each sentence so that my output will be现在我想在每个句子中找到单词的编号，这样我的 output 将是

{'the': 2, 'method': 1, 'of': 1, 'lagrange': 1, 'multipliers': 1, 'is': 1, 'economists': 1, 'workhorse': 1, 'for': 1, 'solving': 1, 'optimization': 1, 'problems': 1}

{'the': 1, 'technique': 1, 'is': 1, 'a': 1, 'centerpiece': 1, 'of': 1, 'economic': 1, 'theory': 1, 'but': 1, 'unfortunately': 1, 'its': 1, 'usually': 1, 'taught': 1, 'poorly': 1}

My code is as below:我的代码如下：

from collections import Counter

dataset = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
           "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]


for index,row in enumerate(dataset):
    word_frequency = dict(Counter(row.split(" ")))
    
print(word_frequency)

With this i am getting output which is:有了这个我得到 output 这是：

{'the': 1, 'technique': 1, 'is': 1, 'a': 1, 'centerpiece': 1, 'of': 1, 'economic': 1, 'theory': 1, 'but': 1, 'unfortunately': 1, 'its': 1, 'usually': 1, 'taught': 1, 'poorly': 1}

Clearly it's only considering the second sentence and counting it but not the first one.显然，它只考虑第二句并计算它，而不是第一句。

Can anyone help me understand what is wrong in my code?谁能帮我理解我的代码有什么问题？

Answer 1

The word_frequency is being updated with every string in the dataset list.In the end, it is storing the Counter for last string in the dataset.Hence, displaying the the Counter for words in the last string. word_frequency正在使用数据集列表中的每个字符串进行更新。最后，它存储数据集中最后一个字符串的计数器。因此，显示最后一个字符串中单词的计数器。 You can use print(word_frequency) inside the for loop or use a list and append the word_frequency to the list each time and once you are out of the loop just print the list .您可以在 for 循环中使用print(word_frequency)或使用list和 append 每次将word_frequency放到列表中，一旦退出循环，只需打印list 。

from collections import Counter

dataset = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
           "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]

l = []
for index,row in enumerate(dataset):
    word_frequency = dict(Counter(row.split(" ")))
    l.append(word_frequency)
    
print(l)

Answer 2

Just move the print to be inside your for loop as you're overwriting your calculated word_frequency parameter.当您覆盖计算的 word_frequency 参数时，只需将打印移动到您的for循环内。

Answer 3

Print inside the loop instead of setting a variable inside the loop (which will get overwritten on each iteration before you print it at the end):在循环内打印，而不是在循环内设置变量（在最后打印之前，它将在每次迭代中被覆盖）：

>>> dataset = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
...            "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]
>>> from collections import Counter
>>> for sentence in dataset:
...     print(dict(Counter(sentence.split())))
...
{'the': 2, 'method': 1, 'of': 1, 'lagrange': 1, 'multipliers': 1, 'is': 1, 'economists': 1, 'workhorse': 1, 'for': 1, 'solving': 1, 'optimization': 1, 'problems': 1}
{'the': 1, 'technique': 1, 'is': 1, 'a': 1, 'centerpiece': 1, 'of': 1, 'economic': 1, 'theory': 1, 'but': 1, 'unfortunately': 1, 'its': 1, 'usually': 1, 'taught': 1, 'poorly': 1}

Answer 4

You're overwriting the word_frequency variable inside the for loop, meaning only the final value from the final count is printed.您正在覆盖for循环内的word_frequency变量，这意味着仅打印最终计数的最终值。

You should instead define a Counter outside the for loop, add to the values of that using Counter's add functionality within the for loop, then convert to dict and print at the end:您应该改为在 for 循环之外定义一个Counter ，在 for 循环中使用 Counter 的add功能添加该值，然后转换为 dict 并在最后打印：

from collections import Counter

dataset = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
           "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]

cumulative_counter = Counter()
for index,row in enumerate(dataset):
  cumulative_counter += Counter(row.split(" "))

word_frequency = dict(cumulative_counter)
print(word_frequency)

Outputs:输出：

{'the': 3, 'method': 1, 'of': 2, 'lagrange': 1, 'multipliers': 1, 'is': 2, 'economists': 1, 'workhorse': 1, 'for': 1, 'solving': 1, 'optimization': 1, 'problems': 1, 'technique': 1, 'a': 1, 'centerpiece': 1, 'economic': 1, 'theory': 1, 'but': 1, 'unfortunately': 1, 'its': 1, 'usually': 1, 'taught': 1, 'poorly': 1}

对于给定的字符串列表，计数单词数

问题描述

4 个解决方案

解决方案1
0 2021-06-15 15:00:23

解决方案2
-1 2021-06-15 14:52:56

解决方案3
-1 2021-06-15 14:57:14

解决方案4
-2 2021-06-15 14:57:35

对于给定的字符串列表，计数单词数

问题描述

4 个解决方案

解决方案1 0 2021-06-15 15:00:23

解决方案2 -1 2021-06-15 14:52:56

解决方案3 -1 2021-06-15 14:57:14

解决方案4 -2 2021-06-15 14:57:35

解决方案1
0 2021-06-15 15:00:23

解决方案2
-1 2021-06-15 14:52:56

解决方案3
-1 2021-06-15 14:57:14

解决方案4
-2 2021-06-15 14:57:35