![](/img/trans.png)
[英]Pandas DataFrame find the max after Groupby two columns and get counts
[英]Pandas GroupBy Two Text Columns And Return The Max Rows Based On Counts
我正在嘗試找出最大(First_Word, Group)
對
import pandas as pd
df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
First_Word Group Text
0 apple apple bins where to buy apple bins
1 apple apple trees i see an apple tree
2 orange orange juice i like orange juice
3 apple apple trees apple fell out of the tree
4 pear pear tree partrige in a pear tree
然后我做一個groupby
:
grouped = df.groupby(['First_Word', 'Group']).count()
Text
First_Word Group
apple apple bins 1
apple trees 2
orange orange juice 1
pear pear tree 1
現在,我想將其篩選為僅具有最大Text
計數的唯一索引行。 在下面,您會注意到apple bins
已刪除,因為apple trees
具有最大值。
Text
First_Word Group
apple apple trees 2
orange orange juice 1
pear pear tree 1
df.groupby(["First_Word", "Group"]).count().apply(lambda t: t[t['Text']==t['Text'].max()])
我收到一個錯誤: KeyError: ('Text', 'occurred at index Text')
。 如果我將axis=1
添加到apply
IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')
得到IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')
給定grouped
,您現在想按First Word
索引級別進行分組,並找到每個組的最大行的索引標簽(使用idxmax
):
In [39]: grouped.groupby(level='First_Word')['Text'].idxmax()
Out[39]:
First_Word
apple (apple, apple trees)
orange (orange, orange juice)
pear (pear, pear tree)
Name: Text, dtype: object
然后,您可以使用grouped.loc
從grouped
索引標簽grouped
的行中選擇行:
import pandas as pd
df = pd.DataFrame(
{'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
grouped = df.groupby(['First_Word', 'Group']).count()
result = grouped.loc[grouped.groupby(level='First_Word')['Text'].idxmax()]
print(result)
產量
Text
First_Word Group
apple apple trees 2
orange orange juice 1
pear pear tree 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.