I am trying to merge two pandas DataFrames with one of many relationship. However, there are a couple of caveats. Explanation below.
import pandas as pd
df1 = pd.DataFrame({'name': ['AA', 'BB', 'CC', 'DD'],
'col1': [1, 2, 3, 4],
'col2': [1, 2, 3, 4] })
df2 = pd.DataFrame({'name': ['AA', 'AA', 'BB', 'BB', 'CC', 'DD'],
'col3': [0, 10, np.nan, 11, 12, 13] })
One way:
>>> df1.merge(df2).drop_duplicates(subset=['name'], keep='last')
name col1 col2 col3
1 AA 1 1 10.0
3 BB 2 2 11.0
4 CC 3 3 12.0
5 DD 4 4 13.0
Try with filter then merge<\/code>
out = df1.merge(df2[df2.col3.ne(0)&df2.col3.notna()])
Out[69]:
name col1 col2 col3
0 AA 1 1 10.0
1 BB 2 2 11.0
2 CC 3 3 12.0
3 DD 4 4 13.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.