[英]How to feed a list as an input to a groupby function in pandas dataframe
Suppose a subset of a dataset comprises these 2 columns, 假设数据集的子集包含这2列,
attacker_king attacker_commander
0 Joffrey/Tommen Baratheon Jaime Lannister
1 Joffrey/Tommen Baratheon Gregor Clegane
2 Joffrey/Tommen Baratheon Jaime Lannister, Andros Brax
3 Robb Stark Roose Bolton, Wylis Manderly, Medger Cerwyn
4 Robb Stark Robb Stark, Brynden Tully
5 Robb Stark Robb Stark, Tytos Blackwood, Brynden Tully
My objective is to get the 'set of commanders' that each king deploys, as per the dataset. 我的目标是根据数据集获取每位国王部署的“指挥官”。
[x for x in battles['attacker_commander'].dropna().str.split(',').sum()]
The above command obtains only comma separated list of commanders But if I choose to use the following list comprehension, 上面的命令仅获取逗号分隔的命令列表,但是如果我选择使用以下列表理解,
battles[['attacker_commander','attacker_king']].groupby('attacker_king').sum()
I get an output where 我得到的输出
attacker_king attacker_commander
Balon/Euron Greyjoy Victarion GreyjoyAsha GreyjoyTheon GreyjoyTheo...
Joffrey/Tommen Baratheon Jaime LannisterGregor CleganeJaime Lannister, ...
Robb Stark Roose Bolton, Wylis Manderly, Medger Cerwyn, H...
Stannis Baratheon Stannis Baratheon, Davos SeaworthStannis Barat...
The problem with this approach is, suppose a row has just 1 commander ,when that is summed with next row, output can look like 'Victarion GreyjoyAsha Greyjoy' instead of 'Victarion Greyjoy,Asha Greyjoy'. 这种方法的问题是,假设一行只有1个指挥官,当与下一行相加时,输出看起来像是“ Victarion GreyjoyAsha Greyjoy”而不是“ Victarion Greyjoy,Asha Greyjoy”。 So does it make sense to use the list created using 所以使用使用创建的列表有意义吗
[x for x in battles['attacker_commander'].dropna().str.split(',').sum()]
and feed it to a groupby('attacker_king') or what approach do you folks suggest? 并将其提供给groupby('attacker_king')或您建议采用哪种方法?
I think you need apply
with function join
first: 我认为您需要先使用函数join
apply
:
battles.groupby('attacker_king')['attacker_commander'].apply(','.join)
If need remove NaN
: 如果需要删除NaN
:
battles.groupby('attacker_king')['attacker_commander'].apply(lambda x: ','.join(x.dropna()))
Then split
and use set
for unique values: 然后split
并使用set
作为唯一值:
df = battles.groupby('attacker_king')['attacker_commander']
.apply(lambda x: list(set(','.join(x.dropna()).split(','))))
print (df)
The best solution for debugging is use custom function and then rewrite code to lambda
: 调试的最佳解决方案是使用自定义函数,然后将代码重写为lambda
:
def f(x):
#Series by attacker_commander per group
print (x)
#first remove NaN
print (x.dropna())
#join by ,
print (','.join(x.dropna()))
#create list by split
print (','.join(x.dropna()).split(','))
#convert to set - unique values
print (set(','.join(x.dropna()).split(',')))
#set convert to list
print (list(set(','.join(x.dropna()).split(','))))
return list(set(','.join(x.dropna()).split(',')))
df = battles.groupby('attacker_king')['attacker_commander'].apply(f)
print (df)
But also one posssible solution is remove rows with NaN
by column DataFrame.dropna
first: 但是还有一个可能的解决方案是首先通过DataFrame.dropna
列删除带有NaN
的行:
def f(x):
return list(set(','.join(x).split(',')))
df = battles.dropna(subset=['attacker_commander']).groupby('attacker_king')['attacker_commander'].apply(f)
print (df)
you want to join the strings by groups then split and find the unique values. 您想按组加入字符串,然后拆分并找到唯一值。
df.groupby(
'attacker_king'
).attacker_commander.apply(','.join).str.split(',').apply(pd.unique)
attacker_king
Joffrey/Tommen Baratheon [Jaime Lannister, Gregor Clegane, Andros Brax]
Robb Stark [Roose Bolton, Wylis Manderly, Medger Cerwyn...
Name: attacker_commander, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.