[英]Pandas GroupBy Two Text Columns And Return The Max Rows Based On Counts
I'm trying to figure out the max (First_Word, Group)
pairs 我正在尝试找出最大(First_Word, Group)
对
import pandas as pd
df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
First_Word Group Text
0 apple apple bins where to buy apple bins
1 apple apple trees i see an apple tree
2 orange orange juice i like orange juice
3 apple apple trees apple fell out of the tree
4 pear pear tree partrige in a pear tree
Then I do a groupby
: 然后我做一个groupby
:
grouped = df.groupby(['First_Word', 'Group']).count()
Text
First_Word Group
apple apple bins 1
apple trees 2
orange orange juice 1
pear pear tree 1
And I now want to filter it down to only unique index rows that have the max Text
counts. 现在,我想将其筛选为仅具有最大Text
计数的唯一索引行。 Below you'll notice apple bins
was removed because apple trees
has the max value. 在下面,您会注意到apple bins
已删除,因为apple trees
具有最大值。
Text
First_Word Group
apple apple trees 2
orange orange juice 1
pear pear tree 1
This max value of group question is similar but when I try something like this: 小组问题的最大价值类似,但是当我尝试这样的事情时:
df.groupby(["First_Word", "Group"]).count().apply(lambda t: t[t['Text']==t['Text'].max()])
I get an error: KeyError: ('Text', 'occurred at index Text')
. 我收到一个错误: KeyError: ('Text', 'occurred at index Text')
。 If I add axis=1
to the apply
I get IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')
如果我将axis=1
添加到apply
IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')
得到IndexError: ('index out of bounds', 'occurred at index (apple, apple bins)')
Given grouped
, you now want to group by the First Word
index level, and find the index labels of the maximum row for each group (using idxmax
): 给定grouped
,您现在想按First Word
索引级别进行分组,并找到每个组的最大行的索引标签(使用idxmax
):
In [39]: grouped.groupby(level='First_Word')['Text'].idxmax()
Out[39]:
First_Word
apple (apple, apple trees)
orange (orange, orange juice)
pear (pear, pear tree)
Name: Text, dtype: object
You can then use grouped.loc
to select rows from grouped
by index label: 然后,您可以使用grouped.loc
从grouped
索引标签grouped
的行中选择行:
import pandas as pd
df = pd.DataFrame(
{'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
grouped = df.groupby(['First_Word', 'Group']).count()
result = grouped.loc[grouped.groupby(level='First_Word')['Text'].idxmax()]
print(result)
yields 产量
Text
First_Word Group
apple apple trees 2
orange orange juice 1
pear pear tree 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.