[英]How to find the list of words in the corpus
Here I have to find words which are in the list c, this words are present in the corpus rows or not.在这里,我必须找到列表 c 中的单词,这些单词是否存在于语料库行中。
I am expecting the answer as [1,3,2,4,1,1,4,1,4]我期待答案为 [1,3,2,4,1,1,4,1,4]
means word "and" is present in row 3 hence answer "1"表示单词“and”出现在第 3 行,因此回答“1”
word "document" is present in the row1,row2 and row4 hence answer is "3" and so on单词“document”出现在 row1、row2 和 row4 中,因此答案是“3”,依此类推
kindly correct my program, also if you have any easiest one then also suggest.请纠正我的程序,如果您有任何最简单的程序,也请提出建议。 Thank you
谢谢
corpus= [
'this is the first document', #row1
'this document is the second document', #row2
'and this is the third one', #row3
'is this the first document', #row4
]
c=['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
a=[]
count=0
for words in c:
a.append(count)
count=0
for row in corpus:
if words in row:
count=count+1
print(a)
Alll your problem is that you use append()
in wrong place.所有你的问题是你在错误的地方使用了
append()
。
You have to use it after for
-loop.你必须在
for
-loop 之后使用它。
for words in c:
count=0
for row in corpus:
if words in row:
count=count+1
a.append(count)
This seems to be functional.这似乎是功能性的。
from collections import Counter
words = []
for corp in corpus:
words.extend(corp.split())
word_counts = Counter(words)
word_counts_list = []
for word in c:
if word not in word_counts:
word_counts_list.append(0)
else:
word_counts_list.append(word_counts[word])
Not the result you were expecting but the result you were expecting is not correct.不是您期望的结果,而是您期望的结果不正确。
word_counts_list
Out[136]: [1, 4, 2, 4, 1, 1, 4, 1, 4]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.