[英]How to merge two Data Frames based on a few columns in second Data Frame in Python Pandas?
I have two Pandas Data Frame in Python like below:我在 Python 中有两个 Pandas 数据框,如下所示:
df1 df1
ID
----
11
22
33
44
df2 df2
ID1 ID2 ID3
--------------------
11 | 5 | 114
88 | 22 | 18
99 | 45 | 33
I need to do something like df1 LEFT JOIN df2 and merge df1 with df2 using "ID" from df1 and "ID1", "ID2", "ID3" from df2我需要做一些类似 df1 LEFT JOIN df2 的操作,并使用来自 df1 的“ID”和来自 df2 的“ID1”、“ID2”、“ID3”将 df1 与 df2 合并
So as a result I need something like below:因此,我需要如下内容:
ID ID1 ID2 ID3
--------------------------
11 | 11 | 5 | 114
22 | 88 | 22 | 18
33 | 99 | 45 | 33
44 | 123456 | 123456 | 123456
How can I do that in Python Pandas ?我怎样才能在 Python Pandas 中做到这一点? I totally do not know.我完全不知道。
You can stack df2
to becomes df2a
, then left join df1
with df2a
followed by left join original df2
matching the original index.您可以将df2
堆叠为df2a
,然后将df1
与df2a
左连接,然后左连接与原始索引匹配的原始df2
。 Fill NaN
with 123456
and drop intermediate columns to arrive at the desired output:用123456
填充NaN
并删除中间列以获得所需的输出:
df2a = df2.stack().reset_index(name='ID')
df_out = (df1.merge(df2a, on='ID', how='left')
.merge(df2, left_on='level_0', right_index=True, how='left')
.fillna(123456, downcast='infer')
.drop(['level_0', 'level_1'], axis=1)
)
or simplify the second .merge
with .join
(thanks for the suggestion of @HenryEcker), as follows:或者用.join
简化第二个.merge
(感谢@HenryEcker 的建议),如下:
df2a = df2.stack().reset_index(name='ID')
df_out = (df1.merge(df2a, on='ID', how='left')
.join(df2, on='level_0')
.fillna(123456, downcast='infer')
.drop(['level_0', 'level_1'], axis=1)
)
Result:结果:
print(df_out)
ID ID1 ID2 ID3
0 11 11 5 114
1 22 88 22 18
2 33 99 45 33
3 44 123456 123456 123456
Break down of step:分解步骤:
print(df2a)
level_0 level_1 ID
0 0 ID1 11
1 0 ID2 5
2 0 ID3 114
3 1 ID1 88
4 1 ID2 22
5 1 ID3 18
6 2 ID1 99
7 2 ID2 45
8 2 ID3 33
You should have a look here ( https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html ), you have many different soutions.你应该看看这里( https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html ),你有很多不同的soutions。 For example:例如:
import pandas as pd
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
}
)
df2 = pd.DataFrame(
{
"A": ["A4", "A5", "A6", "A7"],
"B": ["B4", "B5", "B6", "B7"],
"C": ["C4", "C5", "C6", "C7"],
"D": ["D4", "D5", "D6", "D7"],
}
)
df3 = pd.DataFrame(
{
"A": ["A8", "A9", "A10", "A11", "A12"],
"B": ["B8", "B9", "B10", "B11", "B12"],
"C": ["C8", "C9", "C10", "C11", "C12"],
"D": ["D8", "D9", "D10", "D11", "D12"],
}
)
df = pd.concat([df1, df2, df3], axis=1)
Gives you:给你:
A B C D A B C D A B C D
0 A0 B0 C0 D0 A4 B4 C4 D4 A8 B8 C8 D8
1 A1 B1 C1 D1 A5 B5 C5 D5 A9 B9 C9 D9
2 A2 B2 C2 D2 A6 B6 C6 D6 A10 B10 C10 D10
3 A3 B3 C3 D3 A7 B7 C7 D7 A11 B11 C11 D11
4 NaN NaN NaN NaN NaN NaN NaN NaN A12 B12 C12 D12
So, in your case:所以,在你的情况下:
df1 = pd.DataFrame(
{
"ID": [11, 22, 33, 44]
}
)
df2 = pd.DataFrame(
{
"ID1": [11, 88, 99],
"ID2": [5, 22, 45],
"ID3": [114, 18, 33]
}
)
df = pd.concat([df1, df2], axis=1)
df.fillna('123456', inplace=True) # to replace NaNs with the values you want
Gives:给出:
ID ID1 ID2 ID3
0 11 11.0 5.0 114.0
1 22 88.0 22.0 18.0
2 33 99.0 45.0 33.0
3 44 123456 123456 123456
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.