I build a df:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,6,size=(10, 6)),
columns=list('ABCDEF'))
df = df.applymap(lambda x: 'Sp'+str(x))
print(df)
Gives something like:
A B C D E F
0 Sp4 Sp5 Sp4 Sp4 Sp4 Sp3
1 Sp2 Sp3 Sp5 Sp2 Sp2 Sp3
2 Sp2 Sp3 Sp2 Sp4 Sp5 Sp5
3 Sp5 Sp3 Sp1 Sp4 Sp4 Sp3
4 Sp3 Sp1 Sp1 Sp5 Sp4 Sp1
5 Sp1 Sp4 Sp4 Sp5 Sp4 Sp4
6 Sp2 Sp1 Sp3 Sp4 Sp5 Sp3
7 Sp3 Sp3 Sp2 Sp1 Sp4 Sp4
8 Sp1 Sp1 Sp1 Sp4 Sp2 Sp3
9 Sp5 Sp5 Sp3 Sp4 Sp1 Sp3
How can I remove all rows where (for example) the sum of Sp2 and Sp3 is greater than 2 (ie any combination of them appears more than twice in a row)?
I've been trying to use pandas.DataFrame.eq
Like: df[~df.eq('Sp2').sum(1).gt(2)]
but this only gets rid of rows with Sp2 > 2.
But I dont know how to incorporate the logical OR to make it something like dat[~dat.eq('Sp2' or 'Sp3').sum(1).gt(2)]
Using pandas.DataFrame.isin
:
new_df = df[df.isin(['Sp2', 'Sp3']).sum(1).le(2)]
print(new_df)
Output:
A B C D E F
0 Sp4 Sp5 Sp4 Sp4 Sp4 Sp3
3 Sp5 Sp3 Sp1 Sp4 Sp4 Sp3
4 Sp3 Sp1 Sp1 Sp5 Sp4 Sp1
5 Sp1 Sp4 Sp4 Sp5 Sp4 Sp4
8 Sp1 Sp1 Sp1 Sp4 Sp2 Sp3
9 Sp5 Sp5 Sp3 Sp4 Sp1 Sp3
This answer is based on using the same logic that you were initially attempting to use. You could try -
new_df = df[~(df.eq('Sp2').add(df.eq('Sp3'), fill_value=0).sum(1).gt(2))]
print(new_df)
What this does is fuses both cases before they are summed (effectively the logical OR).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.