Python 3 Pandas：如何找到一列的行元素子集與另一列的元素之間的關系？

Question

我不確定我是否正確地表達了這個問題。 如果我沒有，請告訴我，我會相應地更正。 我手頭的問題是我有以下格式的電影評論 pandas dataframe：

... ... Sentence              Feeling
... ... "I believe I like..." "Positive"
... ... "I would like to..."  "Negative"
... ... "It is great but..."  "Neutral"

我想做的是在“句子”中設置一組所有單個單詞，並計算與“感覺”列相關的使用次數。 例如：

Word "great" has been used in 1 negative sentence, 2 neutral, and 17 positive

是否有我可以使用的 pandas 命令鏈？

Answer 1

這是一個想法，它沒有完全充實，但也許它有幫助......

在下面，我使用了這個示例 DataFrame df ，只是為了有一些東西可以使用：

                         Sentence   Feeling
0                I believe I like  Positive
1                 I would like to  Negative
2            This is not good but   Neutral
3                       Very good  Positive
4                 It is great but   Neutral
5                 It is very good  Positive
6   Not bad, but not great either   Neutral
7                       Bad stuff  Negative
8                   Not that good  Positive
9                 I don't like it  Negative
10               I believe I like  Positive

這個

from collections import Counter

df['Count'] = df['Sentence'].str.lower().str.split().agg(set).agg(Counter)
counts = df[['Count', 'Feeling']].groupby('Feeling').agg(sum).to_dict()
counts = counts['Count']

給你（ print(counts) ）：

{'Negative': Counter({'like': 2,
                      'i': 2,
                      'to': 1,
                      'would': 1,
                      'stuff': 1,
                      'bad': 1,
                      "don't": 1,
                      'it': 1}),
 'Neutral': Counter({'but': 3,
                     'is': 2,
                     'not': 2,
                     'great': 2,
                     'this': 1,
                     'good': 1,
                     'it': 1,
                     'bad,': 1,
                     'either': 1}),
 'Positive': Counter({'good': 3,
                      'believe': 2,
                      'like': 2,
                      'i': 2,
                      'very': 2,
                      'it': 1,
                      'is': 1,
                      'that': 1,
                      'not': 1})}

顯示的計數應與您要查找的數字相對應。

一些解釋：這

df['Sentence'].str.lower().str.split().agg(set)

轉換句子

小寫，以確保計數准確，
然后將它們拆分為單詞列表（沿空格），
最后將列表轉換為集合以避免重復計算

0                  {like, i, believe}
1                {like, to, i, would}
2          {good, not, but, this, is}
...

.agg(Counter)然后將計數器class 應用於集合

0                     {'like': 1, 'believe': 1, 'i': 1}
1              {'like': 1, 'would': 1, 'i': 1, 'to': 1}
2     {'good': 1, 'is': 1, 'this': 1, 'but': 1, 'not...
...

這個

df[['Count', 'Feeling']].groupby('Feeling').agg(sum)

然后將子幀df[['Count', 'Feeling']]沿Feeling列分組，並通過添加計數器來聚合組。 結果是每個Feeling一個計數器，並且相應的字數表示該單詞出現在多少個句子中。

rest 只是提取最終的 output （通過.to_dict()在字典中轉換並丟棄不必要的第一層）。

Python 3 Pandas：如何找到一列的行元素子集與另一列的元素之間的關系？

問題描述

1 個解決方案

解決方案1
0 2020-11-26 21:53:24

Python 3 Pandas：如何找到一列的行元素子集與另一列的元素之間的關系？

問題描述

1 個解決方案

解決方案1 0 2020-11-26 21:53:24

解決方案1
0 2020-11-26 21:53:24