[英]Pandas: Join if value of df1 column is in list of df2 column
Suppose we have two Pandas DataFrames as follows: 假设我们有两个Pandas DataFrame,如下所示:
df1 = pd.DataFrame({'id': ['a', 'b', 'c']})
df1
id
0 a
1 b
2 c
df2 = pd.DataFrame({'ids': [['b','c'], ['a', 'b'], ['a', 'z']],
'info': ['asdf', 'zxcv', 'sdfg']})
df2
ids info
0 [b, c] asdf
1 [a, b] zxcv
2 [a, z] sdfg
How do I join/merge the rows of df1
with df2
where df1.id
is in df2.ids
? 如何将
df1
与df2
的行连接/合并,其中df1.id
在df2.ids
?
In other words, how do I achieve the following: 换句话说,我如何实现以下目标:
df3
id ids info
0 a [a, b] asdf
1 a [a, z] sdfg
2 b [b, c] asdf
3 b [a, b] zxcv
4 c [b, c] asdf
And also a version of the above aggregated on id
, like so: 而且上述的版本汇总上
id
,像这样:
df3
id ids info
0 a [[a, b], [a, z]] [asdf, sdfg]
2 b [[a, b], [b, c]] [asdf, zxcv]
3 c [[b, c]] [asdf]
I tried the following: 我尝试了以下方法:
df1.merge(df2, how = 'left', left_on = 'id', right_on = 'ids')
TypeError: unhashable type: 'list'
df1.id.isin(df2.ids)
TypeError: unhashable type: 'list'
Using stack
, merge
and groupby.agg
: 使用
stack
, merge
和groupby.agg
:
df = df2.set_index('info').ids.apply(pd.Series)\
.stack().reset_index(0, name='id').merge(df2)\
.merge(df1, how='right').sort_values('id')\
.reset_index(drop=True)
print(df)
info id ids
0 zxcv a [a, b]
1 sdfg a [a, z]
2 asdf b [b, c]
3 zxcv b [a, b]
4 asdf c [b, c]
For aggregation use: 对于聚合使用:
df = df.groupby('id', as_index=False).agg(list)
print(df)
id info ids
0 a [zxcv, sdfg] [[a, b], [a, z]]
1 b [asdf, zxcv] [[b, c], [a, b]]
2 c [asdf] [[b, c]]
Use - 采用 -
df2[['id1','id2']] = pd.DataFrame(df2.ids.values.tolist(), index= df2.index)
new_df1 = pd.merge(df1, df2, how='inner', left_on=['id'], right_on = ['id1'])
new_df2 = pd.merge(df1, df2, how='inner', left_on=['id'], right_on = ['id2'])
new_df = new_df1.append(new_df2)[['id','ids','info']]
Output 产量
id ids info
0 a [a, b] zxcv
1 a [a, z] sdfg
2 b [b, c] asdf
0 b [a, b] zxcv
1 c [b, c] asdf
Aggregation Part 聚合部分
new_df.groupby('id')['ids', 'info'].agg(lambda x: list(x))
Output 产量
ids info
id
a [[a, b], [a, z]] [zxcv, sdfg]
b [[b, c], [a, b]] [asdf, zxcv]
c [[b, c]] [asdf]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.