简体   繁体   中英

How can I remove rows in Pandas based on the combined sum of multiple values?

I build a df:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,6,size=(10, 6)),
              columns=list('ABCDEF'))
df = df.applymap(lambda x: 'Sp'+str(x))
print(df)

Gives something like:

     A    B    C    D    E    F
0  Sp4  Sp5  Sp4  Sp4  Sp4  Sp3
1  Sp2  Sp3  Sp5  Sp2  Sp2  Sp3
2  Sp2  Sp3  Sp2  Sp4  Sp5  Sp5
3  Sp5  Sp3  Sp1  Sp4  Sp4  Sp3
4  Sp3  Sp1  Sp1  Sp5  Sp4  Sp1
5  Sp1  Sp4  Sp4  Sp5  Sp4  Sp4
6  Sp2  Sp1  Sp3  Sp4  Sp5  Sp3
7  Sp3  Sp3  Sp2  Sp1  Sp4  Sp4
8  Sp1  Sp1  Sp1  Sp4  Sp2  Sp3
9  Sp5  Sp5  Sp3  Sp4  Sp1  Sp3

How can I remove all rows where (for example) the sum of Sp2 and Sp3 is greater than 2 (ie any combination of them appears more than twice in a row)?

I've been trying to use pandas.DataFrame.eq

Like: df[~df.eq('Sp2').sum(1).gt(2)] but this only gets rid of rows with Sp2 > 2.

But I dont know how to incorporate the logical OR to make it something like dat[~dat.eq('Sp2' or 'Sp3').sum(1).gt(2)]

Using pandas.DataFrame.isin :

new_df = df[df.isin(['Sp2', 'Sp3']).sum(1).le(2)]
print(new_df)

Output:

     A    B    C    D    E    F
0  Sp4  Sp5  Sp4  Sp4  Sp4  Sp3
3  Sp5  Sp3  Sp1  Sp4  Sp4  Sp3
4  Sp3  Sp1  Sp1  Sp5  Sp4  Sp1
5  Sp1  Sp4  Sp4  Sp5  Sp4  Sp4
8  Sp1  Sp1  Sp1  Sp4  Sp2  Sp3
9  Sp5  Sp5  Sp3  Sp4  Sp1  Sp3

This answer is based on using the same logic that you were initially attempting to use. You could try -

new_df = df[~(df.eq('Sp2').add(df.eq('Sp3'), fill_value=0).sum(1).gt(2))]
print(new_df)

What this does is fuses both cases before they are summed (effectively the logical OR).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM