my very first post. I hope I am able to ask the question properly. In the df below there are rows which need to be deleted based on multiple conditions.
All rows, where "ID" exists (could be once or multiple times) and all show "confTyp" == "new" & "trType == "order" & Version == 1 means these are valid entries.
Now, if "ID" is not unique and one of the rows with the same "ID" shows "confTyp" !=new or "trTyp" != "order". All rows with the same "ID" need to be deleted. This means also The inital "ID" with supposingly correct "confTyp", "trTyp2 and "Version has to be deleted.
Deleting whatever is != "new" would still leave the original entry which then had to be deleted as well.
I have tried with the df.drop() method in many different ways but I am far from a good solution. Does anyone have an idea what method would be adequate?
Thank you for you help.
I have the following dataframe:
ID | confTyp | trType | Version |
---|---|---|---|
100 | new | order | 1 |
101 | new | order | 1 |
102 | new | order | 1 |
103 | new | order | 1 |
104 | new | order | 1 |
105 | new | order | 1 |
106 | new | order | 1 |
107 | replace | manual | 1 |
106 | cancel | cancel | 2 |
106 | replace | manual | 1 |
105 | replace | replace | 2 |
104 | cancel | cancel | 2 |
108 | new | order | 1 |
The goal is the following output:
ID | confTyp | trType | Version |
---|---|---|---|
100 | new | order | 1 |
101 | new | order | 1 |
102 | new | order | 1 |
103 | new | order | 1 |
108 | new | order | 1 |
IIUC, you can try:
df = df.set_index('ID')[df.groupby('ID').apply(lambda x: all([set(x['confTyp']) == {
'new'}, set(x['trType']) == {'order'}, set(x['Version']) == {1}]))]
OUTPUT:
confTyp trType Version
ID
100 new order 1
101 new order 1
102 new order 1
103 new order 1
108 new order 1
IIUC, you want something like this (assuming ID
is the index of your DataFrame df
):
output = pd.DataFrame()
for ID in df.index.unique():
sample = df[df.index==ID]
if sample.shape[0] > 1 and any(sample["confTyp"]!="new") and any(sample["trType"]!="order"):
continue
if not (all(sample["confTyp"]=="new") and all(sample["trType"]=="order") and all(sample["Version"]==1)):
continue
output = output.append(sample)
>>> output
confTyp trType Version
ID
100 new order 1
101 new order 1
102 new order 1
103 new order 1
108 new order 1
You could find the IDs that doesn't match the condition and filter the original DataFrame by them
ids = df.loc[(df.confTyp != 'new') | (df.trType != 'order') | (df.Version != 1)].ID
df = df[~df.ID.isin(ids)]
ID confTyp trType Version
100 new order 1
101 new order 1
102 new order 1
103 new order 1
108 new order 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.