[英]Conditionally merge rows in pandas DataFrame
I have the following pandas DataFrame:我有以下 pandas DataFrame:
col1 col2 col3 col4
A 2021-03-28 01:40:00 1.381158 0.0
A 2021-03-28 01:50:00 0.480089 0.0
A 2021-03-28 03:00:00 0.000000 0.0
A 2021-03-28 03:00:00 0.111088 0.0
A 2021-03-28 03:10:00 0.000000 0.0
A 2021-03-28 03:10:00 0.000000 0.0
A 2021-03-28 03:10:00 0.151066 0.0
B 2021-03-28 03:10:00 1.231341 1.0
I need to merge rows that have the same col1
and col2
values, and take the non-zero value for col3
.我需要合并具有相同
col1
和col2
值的行,并为col3
取非零值。
This is the expected output:这是预期的 output:
col1 col2 col3 col4
A 2021-03-28 01:40:00 1.381158 0.0
A 2021-03-28 01:50:00 0.480089 0.0
A 2021-03-28 03:00:00 0.111088 0.0
A 2021-03-28 03:10:00 0.151066 0.0
B 2021-03-28 03:10:00 1.231341 1.0
How can I do this merging?我该如何进行这种合并?
We can use groupby
+ idxmax
for this:为此,我们可以使用
groupby
+ idxmax
:
idx = df.groupby(["col1", "col2"])["col3"].idxmax().to_numpy()
df.loc[idx]
col1 col2 col3 col4
0 A 2021-03-28 01:40:00 1.38 0.00
1 A 2021-03-28 01:50:00 0.48 0.00
3 A 2021-03-28 03:00:00 0.11 0.00
6 A 2021-03-28 03:10:00 0.15 0.00
7 B 2021-03-28 03:10:00 1.23 1.00
An option via sort_values
+ drop_duplicates
with keep='last':通过
sort_values
+ drop_duplicates
和 keep='last' 的选项:
(If the DataFrame is already sorted as it is in the question then just drop_duplicates
and keep last) (如果 DataFrame 已经按问题排序,那么只需
drop_duplicates
并保持最后)
df = df.sort_values(['col1', 'col2', 'col3']) \
.drop_duplicates(['col1', 'col2'], keep='last')
df
: df
:
col1 col2 col3 col4
0 A 2021-03-28 01:40:00 1.381158 0.0
1 A 2021-03-28 01:50:00 0.480089 0.0
3 A 2021-03-28 03:00:00 0.111088 0.0
6 A 2021-03-28 03:10:00 0.151066 0.0
7 B 2021-03-28 03:10:00 1.231341 1.0
We can groupby col1 and col2
and get the non-zero values from col3
.我们可以对
col1 and col2
进行分组,并non-zero values from col3
。
Code代码
df.col2 = pd.to_datetime(df.col2) ## if required
df[df.groupby(['col1', 'col2'])['col3'].transform(lambda x: x != 0)]
Output Output
col1 col2 col3 col4
0 A 2021-03-28 01:40:00 1.381158 0.0
1 A 2021-03-28 01:50:00 0.480089 0.0
2 A 2021-03-28 03:00:00 0.111088 0.0
3 A 2021-03-28 03:10:00 0.151066 0.0
4 B 2021-03-28 03:10:00 1.231341 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.