简体   繁体   English

有条件地合并 pandas DataFrame 中的行

[英]Conditionally merge rows in pandas DataFrame

I have the following pandas DataFrame:我有以下 pandas DataFrame:

col1 col2                   col3        col4 
A    2021-03-28 01:40:00    1.381158    0.0
A    2021-03-28 01:50:00    0.480089    0.0
A    2021-03-28 03:00:00    0.000000    0.0
A    2021-03-28 03:00:00    0.111088    0.0
A    2021-03-28 03:10:00    0.000000    0.0
A    2021-03-28 03:10:00    0.000000    0.0
A    2021-03-28 03:10:00    0.151066    0.0
B    2021-03-28 03:10:00    1.231341    1.0

I need to merge rows that have the same col1 and col2 values, and take the non-zero value for col3 .我需要合并具有相同col1col2值的行,并为col3取非零值。

This is the expected output:这是预期的 output:

col1 col2                   col3        col4 
A    2021-03-28 01:40:00    1.381158    0.0
A    2021-03-28 01:50:00    0.480089    0.0
A    2021-03-28 03:00:00    0.111088    0.0
A    2021-03-28 03:10:00    0.151066    0.0
B    2021-03-28 03:10:00    1.231341    1.0

How can I do this merging?我该如何进行这种合并?

We can use groupby + idxmax for this:为此,我们可以使用groupby + idxmax

idx = df.groupby(["col1", "col2"])["col3"].idxmax().to_numpy()
df.loc[idx]
  col1                 col2  col3  col4
0    A  2021-03-28 01:40:00  1.38  0.00
1    A  2021-03-28 01:50:00  0.48  0.00
3    A  2021-03-28 03:00:00  0.11  0.00
6    A  2021-03-28 03:10:00  0.15  0.00
7    B  2021-03-28 03:10:00  1.23  1.00

An option via sort_values + drop_duplicates with keep='last':通过sort_values + drop_duplicates和 keep='last' 的选项:

(If the DataFrame is already sorted as it is in the question then just drop_duplicates and keep last) (如果 DataFrame 已经按问题排序,那么只需drop_duplicates并保持最后)

df = df.sort_values(['col1', 'col2', 'col3']) \
    .drop_duplicates(['col1', 'col2'], keep='last')

df : df

 col1                col2      col3  col4
0    A 2021-03-28 01:40:00  1.381158   0.0
1    A 2021-03-28 01:50:00  0.480089   0.0
3    A 2021-03-28 03:00:00  0.111088   0.0
6    A 2021-03-28 03:10:00  0.151066   0.0
7    B 2021-03-28 03:10:00  1.231341   1.0

We can groupby col1 and col2 and get the non-zero values from col3 .我们可以对col1 and col2进行分组,并non-zero values from col3

Code代码

df.col2 = pd.to_datetime(df.col2) ## if required
df[df.groupby(['col1', 'col2'])['col3'].transform(lambda x: x != 0)]

Output Output

    col1    col2            col3        col4
0   A   2021-03-28 01:40:00 1.381158    0.0
1   A   2021-03-28 01:50:00 0.480089    0.0
2   A   2021-03-28 03:00:00 0.111088    0.0
3   A   2021-03-28 03:10:00 0.151066    0.0
4   B   2021-03-28 03:10:00 1.231341    1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM