I have the following dataframe:
df = pd.DataFrame({"Code": ['9958S135K108MF-1','9958S135-1','9958S105-1','9958S105K84MF-1',], "ID": ['FO995877000581098', 'FO995877000581098','FO995877000581098','FO995877000581098',], "NUM": ['9958S135','9958S135','9958S105','9958S105']})
I need the following output:
Code ID NUM
0 9958S135K108MF-1 FO995877000581098 9958S135
3 9958S105K84MF-1 FO995877000581098 9958S105
For every "ID"
there should be a unique "NUM"
. There will be many duplicate "ID"
The trick is upon dropping the row which has a duplicate '"ID"' and "'NUM" I need to remove the row that has the prefix ending in MF-1
..
I have tried to add a "Mapping"
column and delete True
values in that column but it will not always allocate "True"
to the correct row which "Code"
contains 'MF-1'.
Here is what I have tried:
import pandas as pd
df['Mapping'] = df['NUM'].eq(df['NUM'].shift()) & df['ID'].eq(df['ID'].shift())
Code ID NUM Mapping
0 9958S135K108MF-1 FO995877000581098 9958S135 False
1 9958S135-1 FO995877000581098 9958S135 True
2 9958S105-1 FO995877000581098 9958S105 False
3 9958S105K84MF-1 FO995877000581098 9958S105 True
I was able to acheive my outcome using the following:
df[~df.duplicated(['ID', 'NUM'], keep=False) | df['Code'].astype(str).str.contains('MF-1')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.