有条件地合并 pandas DataFrame 中的行

Question

I have the following pandas DataFrame:我有以下 pandas DataFrame：

col1 col2                   col3        col4 
A    2021-03-28 01:40:00    1.381158    0.0
A    2021-03-28 01:50:00    0.480089    0.0
A    2021-03-28 03:00:00    0.000000    0.0
A    2021-03-28 03:00:00    0.111088    0.0
A    2021-03-28 03:10:00    0.000000    0.0
A    2021-03-28 03:10:00    0.000000    0.0
A    2021-03-28 03:10:00    0.151066    0.0
B    2021-03-28 03:10:00    1.231341    1.0

I need to merge rows that have the same col1 and col2 values, and take the non-zero value for col3 .我需要合并具有相同col1和col2值的行，并为col3取非零值。

This is the expected output:这是预期的 output：

col1 col2                   col3        col4 
A    2021-03-28 01:40:00    1.381158    0.0
A    2021-03-28 01:50:00    0.480089    0.0
A    2021-03-28 03:00:00    0.111088    0.0
A    2021-03-28 03:10:00    0.151066    0.0
B    2021-03-28 03:10:00    1.231341    1.0

How can I do this merging?我该如何进行这种合并？

Answer 1

We can use groupby + idxmax for this:为此，我们可以使用groupby + idxmax ：

idx = df.groupby(["col1", "col2"])["col3"].idxmax().to_numpy()
df.loc[idx]

  col1                 col2  col3  col4
0    A  2021-03-28 01:40:00  1.38  0.00
1    A  2021-03-28 01:50:00  0.48  0.00
3    A  2021-03-28 03:00:00  0.11  0.00
6    A  2021-03-28 03:10:00  0.15  0.00
7    B  2021-03-28 03:10:00  1.23  1.00

Answer 2

An option via sort_values + drop_duplicates with keep='last':通过sort_values + drop_duplicates和 keep='last' 的选项：

(If the DataFrame is already sorted as it is in the question then just drop_duplicates and keep last) （如果 DataFrame 已经按问题排序，那么只需drop_duplicates并保持最后）

df = df.sort_values(['col1', 'col2', 'col3']) \
    .drop_duplicates(['col1', 'col2'], keep='last')

df : df ：

 col1                col2      col3  col4
0    A 2021-03-28 01:40:00  1.381158   0.0
1    A 2021-03-28 01:50:00  0.480089   0.0
3    A 2021-03-28 03:00:00  0.111088   0.0
6    A 2021-03-28 03:10:00  0.151066   0.0
7    B 2021-03-28 03:10:00  1.231341   1.0

Answer 3

We can groupby col1 and col2 and get the non-zero values from col3 .我们可以对col1 and col2进行分组，并non-zero values from col3 。

Code代码

df.col2 = pd.to_datetime(df.col2) ## if required
df[df.groupby(['col1', 'col2'])['col3'].transform(lambda x: x != 0)]

Output Output

    col1    col2            col3        col4
0   A   2021-03-28 01:40:00 1.381158    0.0
1   A   2021-03-28 01:50:00 0.480089    0.0
2   A   2021-03-28 03:00:00 0.111088    0.0
3   A   2021-03-28 03:10:00 0.151066    0.0
4   B   2021-03-28 03:10:00 1.231341    1.0

有条件地合并 pandas DataFrame 中的行

问题描述

3 个解决方案

解决方案1
5 已采纳 2021-05-23 14:41:08

解决方案2
1 2021-05-23 14:42:09

解决方案3
1 2021-05-23 14:42:28

有条件地合并 pandas DataFrame 中的行

问题描述

3 个解决方案

解决方案1 5 已采纳 2021-05-23 14:41:08

解决方案2 1 2021-05-23 14:42:09

解决方案3 1 2021-05-23 14:42:28

解决方案1
5 已采纳 2021-05-23 14:41:08

解决方案2
1 2021-05-23 14:42:09

解决方案3
1 2021-05-23 14:42:28