How do you remove values from a data frame based on whether they are present in another data frame?

Question

I have 2 data frames, the 1st contains a list of values I am looking to work with and the second contains these values plus a large number of other values. I am looking for the best way to remove the values that do not appear in the 1st data frame from the 2nddata frame to reduce the number of entries I am working with.

Example

Input

DF1

Alpha	code
A	1
D	2
E	3
F	4

DF2

Alpha	code
A	23
B	12
C	1
D	32
E	23
F	45
G	51
H	26

Desired Output:

DF1

Alpha	code
A	1
D	2
E	3
F	4

DF2

Alpha	code
A	23
D	32
E	23
F	45

Answer 1

Assuming that your first column in DF1 is called "Alpha", you can do this:

my_list_DF1 = DF1['Alpha'].unique().tolist() # gets all unique values of first column from DF1 into a list

Then, you can filter your DF2, to include only those values, using isin :

new_DF2 = DF2[DF2['Alpha'].isin(my_list_DF1)]

Which will result in a smaller DF2, only including the common values from the so called 'Alpha' column.

Answer 2

You could do an inner join, dropping all rows that doesn't have an entry and merging all others:

pd.merge(DF1, DF2, on='Alpha', how='inner')

But then you would subsequently have to drop the columns you dont need, and posibly rename if some share a name.

How do you remove values from a data frame based on whether they are present in another data frame?

Question

2 answers

solution1
1 ACCPTED 2020-12-21 16:10:03

solution2
0 2020-12-21 16:24:27

How do you remove values from a data frame based on whether they are present in another data frame?

Question

2 answers

solution1 1 ACCPTED 2020-12-21 16:10:03

solution2 0 2020-12-21 16:24:27

solution1
1 ACCPTED 2020-12-21 16:10:03

solution2
0 2020-12-21 16:24:27