简体   繁体   中英

Find if part of a string is within a Dataframe in pandas dataframe

Kind of a confusing question, but I'll explain it thoroughly.

Here is dataframe 1

ID Name Letter Range
16x019CF123 Mike Aasd 12134
EMU_x123FF2 Lye BASD 21231
SAT_xFF314C Rike GSDAS 21341

Dataframe 2

Index ID
0 019CF123
1 123FF2
2 FF314C

So now I have 2 Panda Datframes

ID in DF2 corresponds to ID in DF1, however not fully.

ID in DF1 |ID in DF2

16x019CF123 | 019CF123 (Notice that the ID in DF2 is just everything after "x" in DF1)

Now, here is what I need to do.

I need to extract entire rows with the ID's from DF 1 which are NOT in DF 2

Hope I made it as clear as I can.

You can extract the ID after the 'x' (here using a regex, but you could also split on 'x' and take the last item) and check if the value isin the reference column. Finally use this info (that is a Series of booleans) to slice the initial dataframe, after inverting the condition (to get " not in"):

df1[~df1['ID'].str.extract('(?<=x)(.*$)').isin(df2['ID'])]

If you want to better understand how this works, here is a version with intermediate variables, you can print them to see the steps:

clean_ID = df1['ID'].str.extract('(?<=x)(.*$)')
mask = clean_ID.isin(df2['ID'])
df3 = df1[~mask]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM