简体   繁体   中英

How to merge Pandas DataFrames with row replication?

I have two DataFrame that look like this:

    bin     last_4  brand       name    chargeback
0   112233  1234    visa        Joe     0
1   445566  5678    visa        Susy    0
2   778899  9012    mastercard  James   0

    bin     last_4  chargeback
0   112233  1234    1
1   445566  5678    1

I want to get the following result:

    bin     last_4  brand       name    chargeback
0   112233  1234    visa        Joe     0
1   445566  5678    visa        Susy    0
2   778899  9012    mastercard  James   0
3   112233  1234    visa        Joe     1
4   445566  5678    visa        Susy    1

I have already tried several attempts of pd.merge() method. However when I called pd.merge(df_1, df_2, how='outer', on=['bin', 'last_4']) I got only 3 rows with duplicated 'chargeback' column like this:

    bin     last_4  brand       name    chargeback_x    chargeback_y
0   112233  1234    visa        Joe     0               1.0
1   445566  5678    visa        Susy    0               1.0
2   778899  9012    mastercard  James   0               NaN

And when I call pd.merge(df_1, df_2, how='outer', on=['bin', 'last_4', 'chargeback']) I got NaN values in 'brand' and 'name' columns:

    bin     last_4  brand       name    chargeback
0   112233  1234    visa        Joe     0
1   445566  5678    visa        Susy    0
2   778899  9012    mastercard  James   0
3   112233  1234    NaN         NaN     1
4   445566  5678    NaN         NaN     1

So do you know how can I get these replicated rows with full information?

You can use pd.concat with pd.merge :

pd.concat([df1,df2.merge(df1.drop('chargeback', axis=1),how='left',on=['bin', 'last_4'])])
Out[1]: 
      bin  last_4       brand   name  chargeback
0  112233    1234        visa    Joe           0
1  445566    5678        visa   Susy           0
2  778899    9012  mastercard  James           0
0  112233    1234        visa    Joe           1
1  445566    5678        visa   Susy           1

Since, the second dataframe has some missing information, merge the first dataframe with the second, but don't merge in the 'chargeback' column. Then, concat this new merged dataframe with the first dataframe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM