I have 2 data frames, the 1st contains a list of values I am looking to work with and the second contains these values plus a large number of other values. I am looking for the best way to remove the values that do not appear in the 1st data frame from the 2nddata frame to reduce the number of entries I am working with.
Example
Input
DF1
Alpha | code |
---|---|
A | 1 |
D | 2 |
E | 3 |
F | 4 |
DF2
Alpha | code |
---|---|
A | 23 |
B | 12 |
C | 1 |
D | 32 |
E | 23 |
F | 45 |
G | 51 |
H | 26 |
Desired Output:
DF1
Alpha | code |
---|---|
A | 1 |
D | 2 |
E | 3 |
F | 4 |
DF2
Alpha | code |
---|---|
A | 23 |
D | 32 |
E | 23 |
F | 45 |
Assuming that your first column in DF1 is called "Alpha", you can do this:
my_list_DF1 = DF1['Alpha'].unique().tolist() # gets all unique values of first column from DF1 into a list
Then, you can filter
your DF2, to include only those values, using isin
:
new_DF2 = DF2[DF2['Alpha'].isin(my_list_DF1)]
Which will result in a smaller DF2, only including the common values from the so called 'Alpha' column.
You could do an inner join, dropping all rows that doesn't have an entry and merging all others:
pd.merge(DF1, DF2, on='Alpha', how='inner')
But then you would subsequently have to drop the columns you dont need, and posibly rename if some share a name.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.