[英]Pandas groupby two columns and output values from 3rd column
colour num accepted returned
grey 1 yes no
red 2 no no
grey 4 yes yes
I have the dataframe above and want to output unique combinations of colour
and num
columns and also the corresponding value in returned
, as below 我具有上述数据帧和要输出的独特组合
colour
和num
列以及在对应的值returned
,如以下
colour num returned
grey 1 no
red 2 no
grey 4 yes
Using df.groupby(['colour', 'num']).size()
gives me unique combinations but not the returned
column. 使用
df.groupby(['colour', 'num']).size()
给我唯一的组合,但没有returned
列。
If you're sure that the combination of colour and num is unique, you can just do: 如果您确定color和num的组合是唯一的,则可以执行以下操作:
df.groupby(['colour', 'num'])['returned'].max()
Of course, if it's not really unique and there is both a 'yes' and 'no' this will return 'yes' because 'yes' > 'no'... 当然,如果它不是唯一的,并且同时存在“是”和“否”,则将返回“是”,因为“是”>“否” ...
But actually, this solution doesn't give anything else than df[['colour','num','returned']].drop_duplicates()
, which is definitely leaner. 但是实际上,此解决方案除了
df[['colour','num','returned']].drop_duplicates()
,没有其他任何东西,它肯定更精简。
If somehow you know that rows can be repeated but the returned value is unique and you want at the same time the number of times it appears and the unique value in "returned", you can do it in one go with: 如果您以某种方式知道行可以重复, 但是返回的值是唯一的,并且同时希望它出现的次数和“ returned”中的唯一值,则可以一次性完成:
df.groupby(['colour','num'])['returned'].agg(['size','max'])
Which would return: 哪个会返回:
size max
colour num
grey 1 1 no
4 1 yes
red 2 1 no
From your description, I think you should to group the returned
column as well. 根据您的描述,我认为您也应该对
returned
列进行分组。
df.groupby(['colour','num','returned']).size()
This will display the number of occurences of each returned status, grouped by num
and colour
: df.groupby(['colour','num','returned']).size()
这将显示每个返回状态的出现次数,按num
和colour
分组:
colour num returned
grey 1 no 1
4 yes 1
red 2 no 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.