简体   繁体   中英

How do you remove values from a data frame based on whether they are present in another data frame?

I have 2 data frames, the 1st contains a list of values I am looking to work with and the second contains these values plus a large number of other values. I am looking for the best way to remove the values that do not appear in the 1st data frame from the 2nddata frame to reduce the number of entries I am working with.

Example

Input

DF1

Alpha code
A 1
D 2
E 3
F 4

DF2

Alpha code
A 23
B 12
C 1
D 32
E 23
F 45
G 51
H 26

Desired Output:

DF1

Alpha code
A 1
D 2
E 3
F 4

DF2

Alpha code
A 23
D 32
E 23
F 45

Assuming that your first column in DF1 is called "Alpha", you can do this:

my_list_DF1 = DF1['Alpha'].unique().tolist() # gets all unique values of first column from DF1 into a list

Then, you can filter your DF2, to include only those values, using isin :

new_DF2 = DF2[DF2['Alpha'].isin(my_list_DF1)]

Which will result in a smaller DF2, only including the common values from the so called 'Alpha' column.

You could do an inner join, dropping all rows that doesn't have an entry and merging all others:

pd.merge(DF1, DF2, on='Alpha', how='inner')

But then you would subsequently have to drop the columns you dont need, and posibly rename if some share a name.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM