简体   繁体   English

Pandas:通过选择值组合具有相同列的两个数据框

[英]Pandas: combine two dataframes with same columns by picking values

I have two dataframes:我有两个数据框:

The first:首先:

id  time_begin  time_end
0   1938    1946
1   1991    1991
2   1359    1991
4   1804    1937
6   1368    1949
... ... ...

Second:第二:

id  time_begin  time_end
1   1946    1946
3   1940    1954
5   1804    1925
6   1978    1978
7   1912    1949

Now, I want to combine the two dataframes in such a way that I get all rows from both.现在,我想以这样的方式组合两个数据框,以便从两者中获取所有行。 But since sometimes the row will be present in both dataframes (eg row 1 and 6), I want to pick the minimum time_begin of the two, and the maximum time_end for the two.但由于有时行会出现在两个数据帧中(例如第 1 行和第 6 行),我想选择两者中的最小 time_begin 和两者的最大 time_end。 Thus my expected result:因此我的预期结果:

id  time_begin  time_end
0   1938    1946
1   1946    1991
2   1359    1991
3   1940    1954
5   1804    1925
4   1804    1937
6   1368    1978
7   1912    1949
... ... ...

How can I achieve this?我怎样才能做到这一点? Normal join/combine operations do not allow for this as far as I can tell.据我所知,正常的加入/组合操作不允许这样做。

You could first merge the dataframes and then use groupby with agg in order to pick min(time_begin) and max(time_end)您可以先合并数据帧,然后使用groupbyagg来选择 min(time_begin) 和 max(time_end)

df1=pd.DataFrame({'id':[0,1,2,4,6],'time_begin':[1938,1991,1359,1804,1368],'time_end': 
                       [1946,1991,1991,1937,1949]})
df2=pd.DataFrame({'id':[1,3,5,6,7],'time_begin':[1946,1940,1804,1978,1912],'time_end': 
                       [1946,1954,1925,1978,1949]})

#merge
df=df1.merge(df2,how='outer') 
#groupby
df=df.groupby('id').agg({'time_begin':'min','time_end':'max'})

Output:输出:

在此处输入图片说明

诀窍是为每列定义不同的聚合函数:

pd.concat([df1, df2]).groupby('id').agg({'time_begin':'min', 'time_end':'max'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM