[英]Pandas: combine two dataframes with same columns by picking values
I have two dataframes:我有两个数据框:
The first:首先:
id time_begin time_end
0 1938 1946
1 1991 1991
2 1359 1991
4 1804 1937
6 1368 1949
... ... ...
Second:第二:
id time_begin time_end
1 1946 1946
3 1940 1954
5 1804 1925
6 1978 1978
7 1912 1949
Now, I want to combine the two dataframes in such a way that I get all rows from both.现在,我想以这样的方式组合两个数据框,以便从两者中获取所有行。 But since sometimes the row will be present in both dataframes (eg row 1 and 6), I want to pick the minimum time_begin of the two, and the maximum time_end for the two.但由于有时行会出现在两个数据帧中(例如第 1 行和第 6 行),我想选择两者中的最小 time_begin 和两者的最大 time_end。 Thus my expected result:因此我的预期结果:
id time_begin time_end
0 1938 1946
1 1946 1991
2 1359 1991
3 1940 1954
5 1804 1925
4 1804 1937
6 1368 1978
7 1912 1949
... ... ...
How can I achieve this?我怎样才能做到这一点? Normal join/combine operations do not allow for this as far as I can tell.据我所知,正常的加入/组合操作不允许这样做。
You could first merge the dataframes and then use groupby with agg in order to pick min(time_begin) and max(time_end)您可以先合并数据帧,然后使用groupby和agg来选择 min(time_begin) 和 max(time_end)
df1=pd.DataFrame({'id':[0,1,2,4,6],'time_begin':[1938,1991,1359,1804,1368],'time_end':
[1946,1991,1991,1937,1949]})
df2=pd.DataFrame({'id':[1,3,5,6,7],'time_begin':[1946,1940,1804,1978,1912],'time_end':
[1946,1954,1925,1978,1949]})
#merge
df=df1.merge(df2,how='outer')
#groupby
df=df.groupby('id').agg({'time_begin':'min','time_end':'max'})
Output:输出:
诀窍是为每列定义不同的聚合函数:
pd.concat([df1, df2]).groupby('id').agg({'time_begin':'min', 'time_end':'max'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.