Delete rows based on multiple conditions in different columns / Python Pandas

Question

my very first post. I hope I am able to ask the question properly. In the df below there are rows which need to be deleted based on multiple conditions.

All rows, where "ID" exists (could be once or multiple times) and all show "confTyp" == "new" & "trType == "order" & Version == 1 means these are valid entries.

Now, if "ID" is not unique and one of the rows with the same "ID" shows "confTyp" !=new or "trTyp" != "order". All rows with the same "ID" need to be deleted. This means also The inital "ID" with supposingly correct "confTyp", "trTyp2 and "Version has to be deleted.

Deleting whatever is != "new" would still leave the original entry which then had to be deleted as well.

I have tried with the df.drop() method in many different ways but I am far from a good solution. Does anyone have an idea what method would be adequate?

Thank you for you help.

I have the following dataframe:

ID	confTyp	trType	Version
100	new	order	1
101	new	order	1
102	new	order	1
103	new	order	1
104	new	order	1
105	new	order	1
106	new	order	1
107	replace	manual	1
106	cancel	cancel	2
106	replace	manual	1
105	replace	replace	2
104	cancel	cancel	2
108	new	order	1

The goal is the following output:

ID	confTyp	trType	Version
100	new	order	1
101	new	order	1
102	new	order	1
103	new	order	1
108	new	order	1

Answer 1

IIUC, you can try:

df = df.set_index('ID')[df.groupby('ID').apply(lambda x:  all([set(x['confTyp']) == {
    'new'}, set(x['trType']) == {'order'}, set(x['Version']) == {1}]))]

OUTPUT:

   confTyp trType  Version
ID                         
100     new  order        1
101     new  order        1
102     new  order        1
103     new  order        1
108     new  order        1

Answer 2

IIUC, you want something like this (assuming ID is the index of your DataFrame df ):

output = pd.DataFrame()
for ID in df.index.unique():
    sample = df[df.index==ID]
    if sample.shape[0] > 1 and any(sample["confTyp"]!="new") and any(sample["trType"]!="order"):
        continue
    if not (all(sample["confTyp"]=="new") and all(sample["trType"]=="order") and all(sample["Version"]==1)):
        continue
    output = output.append(sample)

>>> output
    confTyp trType  Version
ID                         
100     new  order        1
101     new  order        1
102     new  order        1
103     new  order        1
108     new  order        1

Answer 3

You could find the IDs that doesn't match the condition and filter the original DataFrame by them

ids = df.loc[(df.confTyp != 'new') | (df.trType != 'order') | (df.Version != 1)].ID
df = df[~df.ID.isin(ids)]

ID   confTyp trType  Version
100  new     order   1
101  new     order   1
102  new     order   1
103  new     order   1
108  new     order   1

Delete rows based on multiple conditions in different columns / Python Pandas

Question

3 answers

solution1
1 2021-06-22 13:57:52

solution2
0 2021-06-22 13:50:35

solution3
0 2021-06-22 14:04:04

Delete rows based on multiple conditions in different columns / Python Pandas

Question

3 answers

solution1 1 2021-06-22 13:57:52

solution2 0 2021-06-22 13:50:35

solution3 0 2021-06-22 14:04:04

solution1
1 2021-06-22 13:57:52

solution2
0 2021-06-22 13:50:35

solution3
0 2021-06-22 14:04:04