Filter pandas pandas dataframe by comparing rows

Question

I have a Df with for test samples and some of them were redone (redo) now I want to filter only the original ones

col
a
b
a_redo
b_redo
c
d
e
f
g
g_redo

out

col
a
b
g

this is the code that I use to filter only redo sample (_L _Q _S are redo prefix)

sample[sample['col'].str.contains("_L|_Q|_S")]

Answer 1

Filter only redo values by Series.str.endswith , remove it by Series.str.replace and then filter original values in column by Series.isin :

vals = sample.loc[sample['col'].str.endswith("redo"), 'col'].str.replace('_redo','')
df = sample[sample['col'].isin(vals)]
print (df)
  col
0   a
1   b
8   g

With your mask:

vals = sample.loc[sample['col'].str.contains("_L|_Q|_S"), 'col'].str.replace("_L|_Q|_S",'')
df = sample[sample['col'].isin(vals)]

Answer 2

mask_redo = sample['col'].str.contains("_L|_Q|_S")
mask_orig = - mask_redo
sample_orig = sample.loc[mask_orig]

Basically, by placing minus sign before the mask that selects strings containing the redo suffixes, you invert the selection: you now have a mask that selects strings that do not contain redo suffixes, ie your original samples.

Filter pandas pandas dataframe by comparing rows

Question

2 answers

solution1
0 ACCPTED 2021-02-09 10:59:37

solution2
0 2021-02-09 11:05:53

Filter pandas pandas dataframe by comparing rows

Question

2 answers

solution1 0 ACCPTED 2021-02-09 10:59:37

solution2 0 2021-02-09 11:05:53

solution1
0 ACCPTED 2021-02-09 10:59:37

solution2
0 2021-02-09 11:05:53