熊貓通過兩個文本列並根據計數返回最大行

Question

我正在嘗試找出最大(First_Word, Group)對

import pandas as pd

df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
           'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
           'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
                'apple fell out of the tree', 'partrige in a pear tree']},
          columns=['First_Word', 'Group', 'Text'])

  First_Word         Group                        Text
0      apple    apple bins     where to buy apple bins
1      apple   apple trees         i see an apple tree
2     orange  orange juice         i like orange juice
3      apple   apple trees  apple fell out of the tree
4       pear     pear tree     partrige in a pear tree

然后我做一個groupby ：

grouped = df.groupby(['First_Word', 'Group']).count()
                         Text
First_Word Group             
apple      apple bins       1
           apple trees      2
orange     orange juice     1
pear       pear tree        1

現在，我想將其篩選為僅具有最大Text計數的唯一索引行。 在下面，您會注意到apple bins已刪除，因為apple trees具有最大值。

                         Text
First_Word Group             
apple      apple trees      2
orange     orange juice     1
pear       pear tree        1

小組問題的最大價值類似，但是當我嘗試這樣的事情時：

df.groupby(["First_Word", "Group"]).count().apply(lambda t: t[t['Text']==t['Text'].max()])

我收到一個錯誤： KeyError: ('Text', 'occurred at index Text') 。 如果我將axis=1添加到apply IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')得到IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')

Answer 1

給定grouped ，您現在想按First Word索引級別進行分組，並找到每個組的最大行的索引標簽（使用idxmax ）：

In [39]: grouped.groupby(level='First_Word')['Text'].idxmax()
Out[39]: 
First_Word
apple       (apple, apple trees)
orange    (orange, orange juice)
pear           (pear, pear tree)
Name: Text, dtype: object

然后，您可以使用grouped.loc從grouped索引標簽grouped的行中選擇行：

import pandas as pd
df = pd.DataFrame(
    {'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
     'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
     'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
              'apple fell out of the tree', 'partrige in a pear tree']},
    columns=['First_Word', 'Group', 'Text'])

grouped = df.groupby(['First_Word', 'Group']).count()
result = grouped.loc[grouped.groupby(level='First_Word')['Text'].idxmax()]
print(result)

產量

                         Text
First_Word Group             
apple      apple trees      2
orange     orange juice     1
pear       pear tree        1

熊貓通過兩個文本列並根據計數返回最大行

問題描述

1 個解決方案

解決方案1
2 已采納 2016-06-09 21:52:25

熊貓通過兩個文本列並根據計數返回最大行

問題描述

1 個解決方案

解決方案1 2 已采納 2016-06-09 21:52:25

解決方案1
2 已采納 2016-06-09 21:52:25