简体   繁体   English

如何根据 Python Pandas 中第二个数据帧中的几列合并两个数据帧?

[英]How to merge two Data Frames based on a few columns in second Data Frame in Python Pandas?

I have two Pandas Data Frame in Python like below:我在 Python 中有两个 Pandas 数据框,如下所示:

df1 df1

ID
----
11
22
33
44

df2 df2

ID1   ID2   ID3
--------------------
11  | 5   | 114
88  | 22  | 18
99  | 45  | 33
  • df1 has more rows than df2 df1 的行数比 df2 多
  • types of values in boths Data Frames is int两个数据帧中的值类型都是 int

I need to do something like df1 LEFT JOIN df2 and merge df1 with df2 using "ID" from df1 and "ID1", "ID2", "ID3" from df2我需要做一些类似 df1 LEFT JOIN df2 的操作,并使用来自 df1 的“ID”和来自 df2 的“ID1”、“ID2”、“ID3”将 df1 与 df2 合并

  1. merge Data Frames on ID (df1) and ID1 (df2)合并 ID (df1) 和 ID1 (df2) 上的数据帧
  2. if ID does not merge with ID1 --> merge on ID and ID2如果 ID 不与 ID1 合并 --> 合并 ID 和 ID2
  3. if ID does not merge with ID2 --> merge on ID and ID3如果 ID 不与 ID2 合并 --> 合并 ID 和 ID3
  4. give 123456 if rows does not merge如果行不合并,则给出 123456

So as a result I need something like below:因此,我需要如下内容:

ID    ID1   ID2   ID3
--------------------------
11  | 11     | 5       | 114
22  | 88     | 22      | 18
33  | 99     | 45      | 33
44  | 123456 | 123456  | 123456

How can I do that in Python Pandas ?我怎样才能在 Python Pandas 中做到这一点? I totally do not know.我完全不知道。

You can stack df2 to becomes df2a , then left join df1 with df2a followed by left join original df2 matching the original index.您可以将df2堆叠为df2a ,然后将df1df2a左连接,然后左连接与原始索引匹配的原始df2 Fill NaN with 123456 and drop intermediate columns to arrive at the desired output:123456填充NaN并删除中间列以获得所需的输出:

df2a = df2.stack().reset_index(name='ID')

df_out = (df1.merge(df2a, on='ID', how='left')
             .merge(df2, left_on='level_0', right_index=True, how='left')
             .fillna(123456, downcast='infer')
             .drop(['level_0', 'level_1'], axis=1)
         )

or simplify the second .merge with .join (thanks for the suggestion of @HenryEcker), as follows:或者用.join简化第二个.merge (感谢@HenryEcker 的建议),如下:

df2a = df2.stack().reset_index(name='ID')

df_out = (df1.merge(df2a, on='ID', how='left')
             .join(df2, on='level_0')
             .fillna(123456, downcast='infer')
             .drop(['level_0', 'level_1'], axis=1)
         )

Result:结果:

print(df_out)

   ID     ID1     ID2     ID3
0  11      11       5     114
1  22      88      22      18
2  33      99      45      33
3  44  123456  123456  123456

Break down of step:分解步骤:

print(df2a)

   level_0 level_1   ID
0        0     ID1   11
1        0     ID2    5
2        0     ID3  114
3        1     ID1   88
4        1     ID2   22
5        1     ID3   18
6        2     ID1   99
7        2     ID2   45
8        2     ID3   33

You should have a look here ( https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html ), you have many different soutions.你应该看看这里( https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html ),你有很多不同的soutions。 For example:例如:

import pandas as pd

df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)


df2 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["C4", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    }
)


df3 = pd.DataFrame(
{
    "A": ["A8", "A9", "A10", "A11", "A12"],
    "B": ["B8", "B9", "B10", "B11", "B12"],
    "C": ["C8", "C9", "C10", "C11", "C12"],
    "D": ["D8", "D9", "D10", "D11", "D12"],
}
)
df = pd.concat([df1, df2, df3], axis=1)

Gives you:给你:

     A    B    C    D    A    B    C    D    A    B    C    D
0   A0   B0   C0   D0   A4   B4   C4   D4   A8   B8   C8   D8
1   A1   B1   C1   D1   A5   B5   C5   D5   A9   B9   C9   D9
2   A2   B2   C2   D2   A6   B6   C6   D6  A10  B10  C10  D10
3   A3   B3   C3   D3   A7   B7   C7   D7  A11  B11  C11  D11
4  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  A12  B12  C12  D12

So, in your case:所以,在你的情况下:

df1 = pd.DataFrame(
    {
        "ID": [11, 22, 33, 44]
    }
)


df2 = pd.DataFrame(
    {
        "ID1": [11, 88, 99],
        "ID2": [5, 22, 45],
        "ID3": [114, 18, 33]
    }
)


df = pd.concat([df1, df2], axis=1)
df.fillna('123456', inplace=True) # to replace NaNs with the values you want

Gives:给出:

   ID     ID1     ID2     ID3
0  11    11.0     5.0   114.0
1  22    88.0    22.0    18.0
2  33    99.0    45.0    33.0
3  44  123456  123456  123456

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫数据框-合并两个基于“ InStr”> 0的数据框 - Pandas Data Frame - Merge Two Data Frames based on “InStr” > 0 如何根据pandas python中的特定列合并两个数据框? - how to merge two data frames based on particular column in pandas python? 如何基于一个数据框中的一列和第二个数据框中的两列合并两个数据框 - How to merge two data frames based on one column in one data frame and two column in second dataframe Pandas:如何合并两个数据帧并使用第二个数据帧中的值填充 NaN 值 - Pandas: How to merge two data frames and fill NaN values using values from the second data frame 将数据框的两个日期列与 python 中第二个数据框的另外两个数据框进行比较 - compare a two date columns of a data frame with another two data frames of second data frame in python 如何在 Python Pandas 的一个数据框中使用几列进行合并? - How to make merge using a few columns in one Data Frame in Python Pandas? Python Pandas基于多个值字段合并两个数据框 - Python Pandas merge two data frames based on multiple values field python基于部分字符串匹配合并两个pandas数据帧 - python merge two pandas data frames based on partial string match 通过将第一个数据帧的一列与第二个数据帧的两列匹配来合并两个数据帧 - Merge two data frames by matching one column from the first data frame with two columns from the second data frame Python Pandas-在列上合并两个数据框和子字符串 - Python Pandas - Merge two Data Frame and Substring on columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM