简体   繁体   中英

Delete rows based on multiple conditions in different columns / Python Pandas

my very first post. I hope I am able to ask the question properly. In the df below there are rows which need to be deleted based on multiple conditions.

All rows, where "ID" exists (could be once or multiple times) and all show "confTyp" == "new" & "trType == "order" & Version == 1 means these are valid entries.

Now, if "ID" is not unique and one of the rows with the same "ID" shows "confTyp" !=new or "trTyp" != "order". All rows with the same "ID" need to be deleted. This means also The inital "ID" with supposingly correct "confTyp", "trTyp2 and "Version has to be deleted.

Deleting whatever is != "new" would still leave the original entry which then had to be deleted as well.

I have tried with the df.drop() method in many different ways but I am far from a good solution. Does anyone have an idea what method would be adequate?

Thank you for you help.

I have the following dataframe:

ID confTyp trType Version
100 new order 1
101 new order 1
102 new order 1
103 new order 1
104 new order 1
105 new order 1
106 new order 1
107 replace manual 1
106 cancel cancel 2
106 replace manual 1
105 replace replace 2
104 cancel cancel 2
108 new order 1

The goal is the following output:

ID confTyp trType Version
100 new order 1
101 new order 1
102 new order 1
103 new order 1
108 new order 1

IIUC, you can try:

df = df.set_index('ID')[df.groupby('ID').apply(lambda x:  all([set(x['confTyp']) == {
    'new'}, set(x['trType']) == {'order'}, set(x['Version']) == {1}]))]

OUTPUT:

   confTyp trType  Version
ID                         
100     new  order        1
101     new  order        1
102     new  order        1
103     new  order        1
108     new  order        1

IIUC, you want something like this (assuming ID is the index of your DataFrame df ):

output = pd.DataFrame()
for ID in df.index.unique():
    sample = df[df.index==ID]
    if sample.shape[0] > 1 and any(sample["confTyp"]!="new") and any(sample["trType"]!="order"):
        continue
    if not (all(sample["confTyp"]=="new") and all(sample["trType"]=="order") and all(sample["Version"]==1)):
        continue
    output = output.append(sample)

>>> output
    confTyp trType  Version
ID                         
100     new  order        1
101     new  order        1
102     new  order        1
103     new  order        1
108     new  order        1

You could find the IDs that doesn't match the condition and filter the original DataFrame by them

ids = df.loc[(df.confTyp != 'new') | (df.trType != 'order') | (df.Version != 1)].ID
df = df[~df.ID.isin(ids)]

ID   confTyp trType  Version
100  new     order   1
101  new     order   1
102  new     order   1
103  new     order   1
108  new     order   1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM