简体   繁体   中英

Filter pandas pandas dataframe by comparing rows

I have a Df with for test samples and some of them were redone (redo) now I want to filter only the original ones

col
a
b
a_redo
b_redo
c
d
e
f
g
g_redo

out

col
a
b
g

this is the code that I use to filter only redo sample (_L _Q _S are redo prefix)

sample[sample['col'].str.contains("_L|_Q|_S")]

Filter only redo values by Series.str.endswith , remove it by Series.str.replace and then filter original values in column by Series.isin :

vals = sample.loc[sample['col'].str.endswith("redo"), 'col'].str.replace('_redo','')
df = sample[sample['col'].isin(vals)]
print (df)
  col
0   a
1   b
8   g

With your mask:

vals = sample.loc[sample['col'].str.contains("_L|_Q|_S"), 'col'].str.replace("_L|_Q|_S",'')
df = sample[sample['col'].isin(vals)]
mask_redo = sample['col'].str.contains("_L|_Q|_S")
mask_orig = - mask_redo
sample_orig = sample.loc[mask_orig]

Basically, by placing minus sign before the mask that selects strings containing the redo suffixes, you invert the selection: you now have a mask that selects strings that do not contain redo suffixes, ie your original samples.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM