繁体   English   中英

如何在熊猫数据框中将列表作为输入作为groupby函数的输入

[英]How to feed a list as an input to a groupby function in pandas dataframe

假设数据集的子集包含这2列,

     attacker_king              attacker_commander
0   Joffrey/Tommen Baratheon    Jaime Lannister
1   Joffrey/Tommen Baratheon    Gregor Clegane
2   Joffrey/Tommen Baratheon    Jaime Lannister, Andros Brax
3   Robb Stark                  Roose Bolton, Wylis Manderly, Medger Cerwyn
4   Robb Stark                  Robb Stark, Brynden Tully
5   Robb Stark                  Robb Stark, Tytos Blackwood, Brynden Tully

我的目标是根据数据集获取每位国王部署的“指挥官”。

[x for x in battles['attacker_commander'].dropna().str.split(',').sum()]

上面的命令仅获取逗号分隔的命令列表,但是如果我选择使用以下列表理解,

battles[['attacker_commander','attacker_king']].groupby('attacker_king').sum()

我得到的输出

attacker_king                      attacker_commander   
Balon/Euron Greyjoy         Victarion GreyjoyAsha GreyjoyTheon GreyjoyTheo...
Joffrey/Tommen Baratheon    Jaime LannisterGregor CleganeJaime Lannister, ...
Robb Stark                  Roose Bolton, Wylis Manderly, Medger Cerwyn, H...
Stannis Baratheon           Stannis Baratheon, Davos SeaworthStannis Barat...

这种方法的问题是,假设一行只有1个指挥官,当与下一行相加时,输出看起来像是“ Victarion GreyjoyAsha Greyjoy”而不是“ Victarion Greyjoy,Asha Greyjoy”。 所以使用使用创建的列表有意义吗

[x for x in battles['attacker_commander'].dropna().str.split(',').sum()]

并将其提供给groupby('attacker_king')或您建议采用哪种方法?

我认为您需要先使用函数join apply

battles.groupby('attacker_king')['attacker_commander'].apply(','.join)

如果需要删除NaN

battles.groupby('attacker_king')['attacker_commander'].apply(lambda x: ','.join(x.dropna()))

然后split并使用set作为唯一值:

df = battles.groupby('attacker_king')['attacker_commander']
            .apply(lambda x: list(set(','.join(x.dropna()).split(','))))
print (df)

调试的最佳解决方案是使用自定义函数,然后将代码重写为lambda

def f(x):
    #Series by attacker_commander per group
    print (x)
    #first remove NaN
    print (x.dropna())
    #join by ,
    print (','.join(x.dropna()))
    #create list by split
    print (','.join(x.dropna()).split(','))
    #convert to set - unique values
    print (set(','.join(x.dropna()).split(',')))
    #set convert to list
    print (list(set(','.join(x.dropna()).split(','))))
    return list(set(','.join(x.dropna()).split(',')))

df = battles.groupby('attacker_king')['attacker_commander'].apply(f)
print (df)

但是还有一个可能的解决方案是首先通过DataFrame.dropna列删除带有NaN的行:

def f(x):
    return list(set(','.join(x).split(',')))

df = battles.dropna(subset=['attacker_commander']).groupby('attacker_king')['attacker_commander'].apply(f)
print (df)

您想按组加入字符串,然后拆分并找到唯一值。

df.groupby(
    'attacker_king'
).attacker_commander.apply(','.join).str.split(',').apply(pd.unique)

attacker_king
Joffrey/Tommen Baratheon      [Jaime Lannister, Gregor Clegane,  Andros Brax]
Robb Stark                  [Roose Bolton,  Wylis Manderly,  Medger Cerwyn...
Name: attacker_commander, dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM