如何計算熊貓系列中的特定單詞？

Question

我正在嘗試從這樣的pandas DataFrame計算關鍵字的數量：

df = pd.read_csv('amazon_baby.csv')
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

selected_words必須從系列中進行計數：df ['review']

我努力了

def word_counter(sent):
a={}
for word in selected_words:
    a[word] = sent.count(word)
return a

接着

df['totalwords'] = df.review.str.split()
df['word_count'] = df.totalwords.apply(word_counter)

----------------------------------------------------------------------------
----> 1 df['word_count'] = df.totalwords.apply(word_counter)

c:\users\admin\appdata\local\programs\python\python36\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-51-cd11c5eb1f40> in word_counter(sent)
  2     a={}
  3     for word in selected_words:
----> 4         a[word] = sent.count(word)
  5     return a

AttributeError: 'float' object has no attribute 'count'

有人可以幫忙..嗎？ 我猜這是因為該系列中的某些故障值不是字符串。 。。

有些人曾嘗試提供幫助，但問題是DataFrame中的各個單元格中都有句子。

我需要提取選定單詞的數量，最好是字典形式，並將它們存儲在具有相應行的同一dataFrame中的新列中。

CSV格式的數據

Answer 1

假設您的數據框看起來像這樣，

df=pd.DataFrame({'A': ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate','great', 'fantastic', 'amazing', 'love', 'horrible']})
print(df)
    A
0   awesome
1   great
2   fantastic
3   amazing
4   love
5   horrible
6   bad
7   terrible
8   awful
9   wow
10  hate
11  great
12  fantastic
13  amazing
14  love
15  horrible

selected_words=['awesome','great','fantastic']

df.loc[df['A'].isin(selected_words),'A'].value_counts()
[out]
great        2
fantastic    2
awesome      1
Name: A, dtype: int64

Answer 2

在循環中重復使用list.count可以有效地處理值list 。 復雜度為O（ m x n ），其中m是選定值的數量， n是值的總數。

借助Pandas，您可以使用可確保O（ n ）復雜度的優化方法。 在這種情況下，可以使用value_counts然后再使用reindex ：

res = df['A'].value_counts().reindex(selected_words)

print(res)

awesome      1
great        2
fantastic    2
Name: A, dtype: int64

或者，按照@pyd的解決方案，先過濾，然后使用value_counts 。 兩種解決方案都將具有O（ n ）復雜度。

Answer 3

在您的問題中，您似乎正在執行一項針對數量的命令。 @pyd發布了一個很好的計數解決方案。 產生的結果不是命令。 如果您正在尋找字典作為輸出，請查看下面發布的這段代碼，它基本上是pyd提供的解決方案的擴展。

df=pd.DataFrame({'A': ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate','great', 'fantastic', 'amazing', 'love', 'horrible']})

def get_count_dict(data, selected_words):

    count_dict = {}

    counts = data.loc[data['A'].isin(selected_words), 'A'].value_counts()

    for i in range(len(counts.index.tolist())):
        count_dict[counts.index.tolist()[i]] = counts[i]

    return count_dict

selected_words=['awesome','great','fantastic']

get_count_dict(df, selected_words)

Output : {'fantastic': 2, 'great': 2, 'awesome': 1}

如何計算熊貓系列中的特定單詞？

問題描述

3 個解決方案

解決方案1
5 2018-09-07 11:59:30

解決方案2
0 2018-09-07 12:19:17

解決方案3
0 2018-09-07 12:22:14

如何計算熊貓系列中的特定單詞？

問題描述

3 個解決方案

解決方案1 5 2018-09-07 11:59:30

解決方案2 0 2018-09-07 12:19:17

解決方案3 0 2018-09-07 12:22:14

解決方案1
5 2018-09-07 11:59:30

解決方案2
0 2018-09-07 12:19:17

解決方案3
0 2018-09-07 12:22:14