Find if part of a string is within a Dataframe in pandas dataframe

Question

Kind of a confusing question, but I'll explain it thoroughly.

Here is dataframe 1

ID	Name	Letter	Range
16x019CF123	Mike	Aasd	12134
EMU_x123FF2	Lye	BASD	21231
SAT_xFF314C	Rike	GSDAS	21341

Dataframe 2

Index	ID
0	019CF123
1	123FF2
2	FF314C

So now I have 2 Panda Datframes

ID in DF2 corresponds to ID in DF1, however not fully.

ID in DF1 |ID in DF2

16x019CF123 | 019CF123 (Notice that the ID in DF2 is just everything after "x" in DF1)

Now, here is what I need to do.

I need to extract entire rows with the ID's from DF 1 which are NOT in DF 2

Hope I made it as clear as I can.

Answer 1

You can extract the ID after the 'x' (here using a regex, but you could also split on 'x' and take the last item) and check if the value isin the reference column. Finally use this info (that is a Series of booleans) to slice the initial dataframe, after inverting the condition (to get " not in"):

df1[~df1['ID'].str.extract('(?<=x)(.*$)').isin(df2['ID'])]

If you want to better understand how this works, here is a version with intermediate variables, you can print them to see the steps:

clean_ID = df1['ID'].str.extract('(?<=x)(.*$)')
mask = clean_ID.isin(df2['ID'])
df3 = df1[~mask]

Find if part of a string is within a Dataframe in pandas dataframe

Question

1 answers

solution1
1 2021-10-25 03:09:46

Find if part of a string is within a Dataframe in pandas dataframe

Question

1 answers

solution1 1 2021-10-25 03:09:46

solution1
1 2021-10-25 03:09:46