[英]counting based on a column and a string in pandas
假設我有幾個文檔和一個df列,其中包含需要搜索的特定單詞,我如何計算單詞在文檔中出現的次數?
一個例子更好。
例:
doc1 = "I am a cat that barks. I like dog food instead of cat food. Roff"
doc2 = "Frog that barks. Frog like cats."
df['words'] = ["dog","cat","frog"]
尋找它變成看起來像這樣的df。
看起來像這樣,但是我意識到它只是循環到同一個單元中。 所以我總是零。
for i in range(len(doc)):
for key, value in doc.items():
for word in df['word']:
df['doc_' + str(i)] = value.count(word)
doc1 = "I am a cat that barks. I like dog food instead of cat food. Roff"
doc2 = "Frog that barks. Frog like cats."
strings = [doc1, doc2]
words = ["dog","cat","frog"]
def count_occ(word, sentence):
return sentence.lower().split().count(word)
cts = []
def counts_df(strings, words):
for w in words:
for s in strings:
cts.append(count_occ(w, s))
df = pd.DataFrame(np.array(cts).reshape((len(words), len(strings))),
index=words,
columns=['doc' + str(i) for i in range(1, len(strings) + 1)])
return df
counts_df(strings, words)
Out[61]:
doc1 doc2
dog 1 0
cat 2 0
frog 0 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.