根據熊貓中的列和字符串進行計數

Question

假設我有幾個文檔和一個df列，其中包含需要搜索的特定單詞，我如何計算單詞在文檔中出現的次數？

一個例子更好。

例：

doc1 = "I am a cat that barks. I like dog food instead of cat food. Roff"

doc2 = "Frog that barks. Frog like cats."

df['words'] = ["dog","cat","frog"]

尋找它變成看起來像這樣的df。

看起來像這樣，但是我意識到它只是循環到同一個單元中。 所以我總是零。

for i in range(len(doc)):
    for key, value in doc.items():
        for word in df['word']:
            df['doc_' + str(i)] = value.count(word)

Answer 1

doc1 = "I am a cat that barks. I like dog food instead of cat food. Roff"
doc2 = "Frog that barks. Frog like cats."
strings = [doc1, doc2]
words = ["dog","cat","frog"]

def count_occ(word, sentence):
    return sentence.lower().split().count(word)    

cts = []

def counts_df(strings, words):    
    for w in words:
        for s in strings:
            cts.append(count_occ(w, s))
    df = pd.DataFrame(np.array(cts).reshape((len(words), len(strings))),
                      index=words, 
                      columns=['doc' + str(i) for i in range(1, len(strings) + 1)])    
    return df

counts_df(strings, words)
Out[61]: 
      doc1  doc2
dog      1     0
cat      2     0
frog     0     2

根據熊貓中的列和字符串進行計數

問題描述

1 個解決方案

解決方案1
0 已采納 2017-06-02 23:40:43

根據熊貓中的列和字符串進行計數

問題描述

1 個解決方案

解決方案1 0 已采納 2017-06-02 23:40:43

解決方案1
0 已采納 2017-06-02 23:40:43