简体   繁体   English

pandas groupby 和 map 值列表

[英]pandas groupby and map list of values

There are two dataframes, where 1st dataframe contains list of cells and person names.有两个数据框,其中第一个 dataframe 包含单元格列表和人名。 2nd dataframe contains the actual values to be mapped to.第二个 dataframe 包含要映射到的实际值。

df1: df1:

Name           celllist
Bob            ['a', 'v']
April          ['b', 'c']
Amy            ['v']
Linda          ['g', 'r']

df2: df2:

Name    cell    value
Bob      a       4
Bob      g       6
Bob      v       8
Arpil    a       6
Arpil    g       8
Arpil    b       9
Arpil    c       1
Amy      v       2
Amy      b       2

This is what I would like to have, I want to extract values from df2 for the cellist elements and add them back to the df1.这就是我想要的,我想从 df2 中提取大提琴元素的值并将它们添加回 df1。

excpected Result: df1:预期结果:df1:

Name           Group          Group_Name
Bob            ['a', 'v']         [4, 8]       
April          ['b', 'c']         [9, 1]
Amy            ['v']              [2]
Linda          ['g', 'r']         [None, None]

Can someone help me to get this or suggest any better solution?有人可以帮我解决这个问题或提出更好的解决方案吗?

IIUC, you need to group only those values in the list or NaN, so do: IIUC,您只需要对列表中的那些值或 NaN 进行分组,因此请执行以下操作:

# create DataFrame to check which values of cell are in Group
res = df2.merge(df1, on='Name', how='right').explode('Group')

# create mask 
mask = res['Group'].eq(res['cell']) | res['cell'].isna()

# filter, group by, agg and rename
output = res[mask].groupby('Name', sort=False).agg({'Group': list, 'value': list}).rename(
    columns={'value': 'Group Name'})
print(output)

Output Output

        Group  Group Name
Name                     
Bob    [a, v]  [4.0, 8.0]
April  [b, c]  [9.0, 1.0]
Amy       [v]       [2.0]
Linda  [g, r]  [nan, nan]

UPDATE更新

For keeping the original list order, you could add an additional step:为了保持原始列表顺序,您可以添加一个额外的步骤:

# create DataFrame to check which values of cell are in Group
res = df1.merge(df2, on='Name', how='left').explode('Group', ignore_index=True)

# reorder DataFrame to keep original list order
res['ord'] = np.arange(len(res))
res['ord'] = res.groupby(['Name', 'Group'])['ord'].transform('first')
res = res.sort_values(by='ord').drop('ord', 1)

# create mask
mask = res['Group'].eq(res['cell']) | res['cell'].isna()

# filter, group by, agg and rename
output = res[mask].groupby('Name', sort=False).agg({'Group': list, 'value': list}).rename(
    columns={'value': 'Group Name'})
print(output)

I believe a few steps in Dani's method above can be truly avoided.我相信上面Dani方法中的几个步骤可以真正避免。 The mask creation is unnecessary.不需要创建蒙版。 If the input dataframe is leveled before merging this solves the problem better and faster.如果输入 dataframe 在合并之前被调平,则可以更好更快地解决问题。 Any keys that don't need matching are ignored in the larger dataframe, and the intermediate results are kept as lean as possible.在较大的 dataframe 中忽略任何不需要匹配的键,并尽可能保持中间结果精简。

# Assuming df as first dataframe and df2 as larger 2nd dataframe
df.explode('celllist').merge(
    df2, how='left',
    left_on=['Name', 'celllist'],
    right_on=['Name', 'cell']
).drop(columns=['cell']).groupby('Name', sort=False).agg(
    {'celllist': list, 'value': list}).rename(
    columns={'celllist': 'Group', 'value': 'Group Name'})

This yields the output just as desired by the OP.这会产生 output,正如 OP 所期望的那样。

Output: Output:

        Group   Group Name
Name        
Bob     [a, v]  [4.0, 8.0]
April   [b, c]  [9.0, 1.0]
Amy     [v]     [2.0]
Linda   [g, r]  [nan, nan]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM