简体   繁体   中英

removing rows from dataframe based on condition

I have the following dataframe:

df = pd.DataFrame({"Code": ['9958S135K108MF-1','9958S135-1','9958S105-1','9958S105K84MF-1',], "ID": ['FO995877000581098', 'FO995877000581098','FO995877000581098','FO995877000581098',], "NUM": ['9958S135','9958S135','9958S105','9958S105']})

I need the following output:

    Code                ID                  NUM
0   9958S135K108MF-1    FO995877000581098   9958S135
3   9958S105K84MF-1     FO995877000581098   9958S105

For every "ID" there should be a unique "NUM" . There will be many duplicate "ID"

The trick is upon dropping the row which has a duplicate '"ID"' and "'NUM" I need to remove the row that has the prefix ending in MF-1 ..

I have tried to add a "Mapping" column and delete True values in that column but it will not always allocate "True" to the correct row which "Code" contains 'MF-1'.

Here is what I have tried:

import pandas as pd

df['Mapping'] = df['NUM'].eq(df['NUM'].shift()) & df['ID'].eq(df['ID'].shift())

    Code                ID                  NUM         Mapping
0   9958S135K108MF-1    FO995877000581098   9958S135    False
1   9958S135-1          FO995877000581098   9958S135    True
2   9958S105-1          FO995877000581098   9958S105    False
3   9958S105K84MF-1     FO995877000581098   9958S105    True

I was able to acheive my outcome using the following:

df[~df.duplicated(['ID', 'NUM'], keep=False) | df['Code'].astype(str).str.contains('MF-1')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM