Kind of a confusing question, but I'll explain it thoroughly.
Here is dataframe 1
ID | Name | Letter | Range |
---|---|---|---|
16x019CF123 | Mike | Aasd | 12134 |
EMU_x123FF2 | Lye | BASD | 21231 |
SAT_xFF314C | Rike | GSDAS | 21341 |
Dataframe 2
Index | ID |
---|---|
0 | 019CF123 |
1 | 123FF2 |
2 | FF314C |
So now I have 2 Panda Datframes
ID in DF2 corresponds to ID in DF1, however not fully.
ID in DF1 |ID in DF2
16x019CF123 | 019CF123 (Notice that the ID in DF2 is just everything after "x" in DF1)
Now, here is what I need to do.
I need to extract entire rows with the ID's from DF 1 which are NOT in DF 2
Hope I made it as clear as I can.
You can extract
the ID after the 'x' (here using a regex, but you could also split on 'x' and take the last item) and check if the value isin
the reference column. Finally use this info (that is a Series of booleans) to slice the initial dataframe, after inverting the condition (to get " not in"):
df1[~df1['ID'].str.extract('(?<=x)(.*$)').isin(df2['ID'])]
If you want to better understand how this works, here is a version with intermediate variables, you can print them to see the steps:
clean_ID = df1['ID'].str.extract('(?<=x)(.*$)')
mask = clean_ID.isin(df2['ID'])
df3 = df1[~mask]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.