简体   繁体   中英

Pandas conditional merge 2 dataframes with one to many relationship

I am trying to merge two pandas DataFrames with one of many relationship. However, there are a couple of caveats. Explanation below.

import pandas as pd

df1 = pd.DataFrame({'name': ['AA', 'BB', 'CC', 'DD'],
                    'col1': [1, 2, 3, 4],
                    'col2': [1, 2, 3, 4] })

df2 = pd.DataFrame({'name': ['AA', 'AA', 'BB', 'BB', 'CC', 'DD'],
                    'col3': [0, 10, np.nan, 11, 12, 13] })

One way:

>>> df1.merge(df2).drop_duplicates(subset=['name'], keep='last')
  name  col1  col2  col3
1   AA     1     1  10.0
3   BB     2     2  11.0
4   CC     3     3  12.0
5   DD     4     4  13.0

Try with filter then merge<\/code>

out = df1.merge(df2[df2.col3.ne(0)&df2.col3.notna()])
Out[69]: 
  name  col1  col2  col3
0   AA     1     1  10.0
1   BB     2     2  11.0
2   CC     3     3  12.0
3   DD     4     4  13.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM