如何根据 Python Pandas 中第二个数据帧中的几列合并两个数据帧？

Question

I have two Pandas Data Frame in Python like below:我在 Python 中有两个 Pandas 数据框，如下所示：

df1 df1

ID
----
11
22
33
44

df2 df2

ID1   ID2   ID3
--------------------
11  | 5   | 114
88  | 22  | 18
99  | 45  | 33

df1 has more rows than df2 df1 的行数比 df2 多
types of values in boths Data Frames is int两个数据帧中的值类型都是 int

I need to do something like df1 LEFT JOIN df2 and merge df1 with df2 using "ID" from df1 and "ID1", "ID2", "ID3" from df2我需要做一些类似 df1 LEFT JOIN df2 的操作，并使用来自 df1 的“ID”和来自 df2 的“ID1”、“ID2”、“ID3”将 df1 与 df2 合并

merge Data Frames on ID (df1) and ID1 (df2)合并 ID (df1) 和 ID1 (df2) 上的数据帧
if ID does not merge with ID1 --> merge on ID and ID2如果 ID 不与 ID1 合并 --> 合并 ID 和 ID2
if ID does not merge with ID2 --> merge on ID and ID3如果 ID 不与 ID2 合并 --> 合并 ID 和 ID3
give 123456 if rows does not merge如果行不合并，则给出 123456

So as a result I need something like below:因此，我需要如下内容：

ID    ID1   ID2   ID3
--------------------------
11  | 11     | 5       | 114
22  | 88     | 22      | 18
33  | 99     | 45      | 33
44  | 123456 | 123456  | 123456

How can I do that in Python Pandas ?我怎样才能在 Python Pandas 中做到这一点？ I totally do not know.我完全不知道。

Answer 1

You can stack df2 to becomes df2a , then left join df1 with df2a followed by left join original df2 matching the original index.您可以将df2堆叠为df2a ，然后将df1与df2a左连接，然后左连接与原始索引匹配的原始df2 。 Fill NaN with 123456 and drop intermediate columns to arrive at the desired output:用123456填充NaN并删除中间列以获得所需的输出：

df2a = df2.stack().reset_index(name='ID')

df_out = (df1.merge(df2a, on='ID', how='left')
             .merge(df2, left_on='level_0', right_index=True, how='left')
             .fillna(123456, downcast='infer')
             .drop(['level_0', 'level_1'], axis=1)
         )

or simplify the second .merge with .join (thanks for the suggestion of @HenryEcker), as follows:或者用.join简化第二个.merge （感谢@HenryEcker 的建议），如下：

df2a = df2.stack().reset_index(name='ID')

df_out = (df1.merge(df2a, on='ID', how='left')
             .join(df2, on='level_0')
             .fillna(123456, downcast='infer')
             .drop(['level_0', 'level_1'], axis=1)
         )

Result:结果：

print(df_out)

   ID     ID1     ID2     ID3
0  11      11       5     114
1  22      88      22      18
2  33      99      45      33
3  44  123456  123456  123456

Break down of step:分解步骤：

print(df2a)

   level_0 level_1   ID
0        0     ID1   11
1        0     ID2    5
2        0     ID3  114
3        1     ID1   88
4        1     ID2   22
5        1     ID3   18
6        2     ID1   99
7        2     ID2   45
8        2     ID3   33

Answer 2

You should have a look here ( https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html ), you have many different soutions.你应该看看这里（ https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html ），你有很多不同的soutions。 For example:例如：

import pandas as pd

df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)


df2 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["C4", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    }
)


df3 = pd.DataFrame(
{
    "A": ["A8", "A9", "A10", "A11", "A12"],
    "B": ["B8", "B9", "B10", "B11", "B12"],
    "C": ["C8", "C9", "C10", "C11", "C12"],
    "D": ["D8", "D9", "D10", "D11", "D12"],
}
)
df = pd.concat([df1, df2, df3], axis=1)

Gives you:给你：

     A    B    C    D    A    B    C    D    A    B    C    D
0   A0   B0   C0   D0   A4   B4   C4   D4   A8   B8   C8   D8
1   A1   B1   C1   D1   A5   B5   C5   D5   A9   B9   C9   D9
2   A2   B2   C2   D2   A6   B6   C6   D6  A10  B10  C10  D10
3   A3   B3   C3   D3   A7   B7   C7   D7  A11  B11  C11  D11
4  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  A12  B12  C12  D12

So, in your case:所以，在你的情况下：

df1 = pd.DataFrame(
    {
        "ID": [11, 22, 33, 44]
    }
)


df2 = pd.DataFrame(
    {
        "ID1": [11, 88, 99],
        "ID2": [5, 22, 45],
        "ID3": [114, 18, 33]
    }
)


df = pd.concat([df1, df2], axis=1)
df.fillna('123456', inplace=True) # to replace NaNs with the values you want

Gives:给出：

   ID     ID1     ID2     ID3
0  11    11.0     5.0   114.0
1  22    88.0    22.0    18.0
2  33    99.0    45.0    33.0
3  44  123456  123456  123456

如何根据 Python Pandas 中第二个数据帧中的几列合并两个数据帧？

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-10-16 22:15:07

解决方案2
0 2021-10-16 21:45:45

如何根据 Python Pandas 中第二个数据帧中的几列合并两个数据帧？

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-10-16 22:15:07

解决方案2 0 2021-10-16 21:45:45

解决方案1
2 已采纳 2021-10-16 22:15:07

解决方案2
0 2021-10-16 21:45:45