I have a Df with for test samples and some of them were redone (redo) now I want to filter only the original ones
col
a
b
a_redo
b_redo
c
d
e
f
g
g_redo
out
col
a
b
g
this is the code that I use to filter only redo sample (_L _Q _S are redo prefix)
sample[sample['col'].str.contains("_L|_Q|_S")]
Filter only redo
values by Series.str.endswith
, remove it by Series.str.replace
and then filter original values in column by Series.isin
:
vals = sample.loc[sample['col'].str.endswith("redo"), 'col'].str.replace('_redo','')
df = sample[sample['col'].isin(vals)]
print (df)
col
0 a
1 b
8 g
With your mask:
vals = sample.loc[sample['col'].str.contains("_L|_Q|_S"), 'col'].str.replace("_L|_Q|_S",'')
df = sample[sample['col'].isin(vals)]
mask_redo = sample['col'].str.contains("_L|_Q|_S")
mask_orig = - mask_redo
sample_orig = sample.loc[mask_orig]
Basically, by placing minus sign before the mask that selects strings containing the redo suffixes, you invert the selection: you now have a mask that selects strings that do not contain redo suffixes, ie your original samples.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.