简体   繁体   English

如果它们在两列PANDAS之间重叠,如何折叠一系列行

[英]How to collapse a sequence of rows if they overlap amongst two columns PANDAS

I have a dataframe as follows:我有一个数据框如下:

age_start年龄开始 age_end年龄结束
2 2 6 6
6 6 10 10
11 11 16 16
17 17 18 18
21 21 25 25
27 27 30 30
30 30 34 34

I want to aggregate successive rows where the values between the age_end overlap with the subsequent age_start value.我想聚合其中age_end 之间的值与随后的age_start 值重叠的连续行。 For example, the first two rows would be collapsed because 6 is the overlapping value amongst them.例如,前两行将被折叠,因为 6 是它们之间的重叠值。 The last two rows would also be collapsed because the overlapping value is 30. The goal is to create broader age groups and scale it so the function can aggregate any number of successive rows and not just pairs.最后两行也将被折叠,因为重叠值为 30。目标是创建更广泛的年龄组并对其进行缩放,以便该函数可以聚合任意数量的连续行,而不仅仅是对。 The desired output is:所需的输出是:

age_start年龄开始 age_end年龄结束
2 2 10 10
11 11 16 16
17 17 18 18
21 21 25 25
27 27 34 34
# Mark transitions:
df.loc[df.age_start.gt(df.age_end.shift(1)), 'group'] = 1
# Create Groups:
df['group'] = df['group'].cumsum().ffill().fillna(0)
# Extract start/stop point from groups:
out = df.groupby('group').agg({'age_start':'min', 'age_end':'max'}).reset_index(drop=True)
print(out)

Output:输出:

   age_start  age_end
0          2       10
1         11       16
2         17       18
3         21       25
4         27       34

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM