如何找到语料库中的单词列表

Question

在这里，我必须找到列表 c 中的单词，这些单词是否存在于语料库行中。

我期待答案为 [1,3,2,4,1,1,4,1,4]

表示单词“and”出现在第 3 行，因此回答“1”

单词“document”出现在 row1、row2 和 row4 中，因此答案是“3”，依此类推

请纠正我的程序，如果您有任何最简单的程序，也请提出建议。 谢谢

corpus= [
         'this is the first document',            #row1
         'this document is the second document',  #row2
         'and this is the third one',             #row3
         'is this the first document',            #row4
    ]

c=['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

a=[]
count=0

for words in c:
  a.append(count)
  count=0
  for row in corpus:
    if words in row:
      count=count+1
print(a)

Answer 1

所有你的问题是你在错误的地方使用了append() 。

你必须在for -loop 之后使用它。

for words in c:
  count=0
  for row in corpus:
    if words in row:
      count=count+1
  a.append(count)

Answer 2

这似乎是功能性的。

from collections import Counter

words = []
for corp in corpus:
    words.extend(corp.split())

word_counts = Counter(words)

word_counts_list = []
for word in c:
    if word not in word_counts:
        word_counts_list.append(0)
    else:
        word_counts_list.append(word_counts[word])

不是您期望的结果，而是您期望的结果不正确。

word_counts_list
Out[136]: [1, 4, 2, 4, 1, 1, 4, 1, 4]

如何找到语料库中的单词列表

问题描述

2 个解决方案

解决方案1
0 2022-08-26 12:23:00

解决方案2
0 2022-08-26 12:26:32

如何找到语料库中的单词列表

问题描述

2 个解决方案

解决方案1 0 2022-08-26 12:23:00

解决方案2 0 2022-08-26 12:26:32

解决方案1
0 2022-08-26 12:23:00

解决方案2
0 2022-08-26 12:26:32