简体   繁体   English

Pandas:如果df1列的值在df2列的列表中,则加入

[英]Pandas: Join if value of df1 column is in list of df2 column

Suppose we have two Pandas DataFrames as follows: 假设我们有两个Pandas DataFrame,如下所示:

df1 = pd.DataFrame({'id': ['a', 'b', 'c']})
df1
    id
0   a
1   b
2   c

df2 = pd.DataFrame({'ids': [['b','c'], ['a', 'b'], ['a', 'z']], 
                    'info': ['asdf', 'zxcv', 'sdfg']})
df2
    ids     info
0   [b, c]  asdf
1   [a, b]  zxcv
2   [a, z]  sdfg

How do I join/merge the rows of df1 with df2 where df1.id is in df2.ids ? 如何将df1df2的行连接/合并,其中df1.iddf2.ids

In other words, how do I achieve the following: 换句话说,我如何实现以下目标:

df3
   id   ids     info
0  a    [a, b]  asdf
1  a    [a, z]  sdfg
2  b    [b, c]  asdf
3  b    [a, b]  zxcv
4  c    [b, c]  asdf

And also a version of the above aggregated on id , like so: 而且上述的版本汇总上id ,像这样:

df3
   id   ids               info
0  a    [[a, b], [a, z]]  [asdf, sdfg]
2  b    [[a, b], [b, c]]  [asdf, zxcv]
3  c    [[b, c]]          [asdf]

I tried the following: 我尝试了以下方法:

df1.merge(df2, how = 'left', left_on = 'id', right_on = 'ids')
TypeError: unhashable type: 'list'

df1.id.isin(df2.ids)
TypeError: unhashable type: 'list'

Using stack , merge and groupby.agg : 使用stackmergegroupby.agg

df = df2.set_index('info').ids.apply(pd.Series)\
        .stack().reset_index(0, name='id').merge(df2)\
        .merge(df1, how='right').sort_values('id')\
        .reset_index(drop=True)

print(df)
   info id     ids
0  zxcv  a  [a, b]
1  sdfg  a  [a, z]
2  asdf  b  [b, c]
3  zxcv  b  [a, b]
4  asdf  c  [b, c]

For aggregation use: 对于聚合使用:

df = df.groupby('id', as_index=False).agg(list)

print(df)
  id          info               ids
0  a  [zxcv, sdfg]  [[a, b], [a, z]]
1  b  [asdf, zxcv]  [[b, c], [a, b]]
2  c        [asdf]          [[b, c]]

Use - 采用 -

df2[['id1','id2']] = pd.DataFrame(df2.ids.values.tolist(), index= df2.index)
new_df1 = pd.merge(df1, df2,  how='inner', left_on=['id'], right_on = ['id1'])
new_df2 = pd.merge(df1, df2,  how='inner', left_on=['id'], right_on = ['id2'])
new_df = new_df1.append(new_df2)[['id','ids','info']]

Output 产量

id  ids info
0   a   [a, b]  zxcv
1   a   [a, z]  sdfg
2   b   [b, c]  asdf
0   b   [a, b]  zxcv
1   c   [b, c]  asdf

Aggregation Part 聚合部分

new_df.groupby('id')['ids', 'info'].agg(lambda x: list(x))

Output 产量

ids info
id      
a   [[a, b], [a, z]]    [zxcv, sdfg]
b   [[b, c], [a, b]]    [asdf, zxcv]
c   [[b, c]]    [asdf]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果 df1 column1 中的值与列表中的值匹配,Pandas 从另一个 df1 column2 在 df2 中创建新列 - Pandas create new column in df2 from another df1 column2 if a value in df1 column1 matches value in a list 如果df2索引中的df1索引,熊猫会更新列值 - Pandas update column value if df1 index in df2 index 如果列值不在 df2 列中,则获取 df1 的行 - Get row of df1 if column value not in column df2 Python - 检查df2列中是否存在df1列中的值 - Python - Check if a value in a df1 column is present in df2 column 熊猫:如何正确执行df2中的行= df1中的列? - Pandas: how to do row in df2 = column in df1 properly? 根据 df2 中的条件查找 df1 中的列值 - Looking up column value in df1 based on criteria in df2 根据 df1 中的值在 df2 中保留一列 - Keep one column in df2 based on value in df1 pandas 如何从 df2 获取 df1 的值,而 df1 和 df2 的值在列上重叠 - pandas how to get values from df2 for df1 while df1 and df2 have values overlapped on column(s) 将 pandas 数据帧 (df1) 行值匹配到另一个数据帧 (df2) 列并更新数据帧 (Df1) 中不同列的行 - Match a pandas Data frame (df1) row value to another Data frame (df2) column and update a rows of different column in Data frame (Df1) 新df列,即当前df1值除以df2值 - New df column that is current df1 value divided by df2 value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM