简体   繁体   English

使用 python 从 df 中删除不在另一个 df 中的记录

[英]drop records from a df that are not in another df using python

I've a sample datafram1我有一个示例数据帧1

date           username         cities
2021-03-01     K John           New york
2021-03-01     K John           LA
2021-03-02     Ken Miles        Florida
2021-03-02     Ken Miles        LA

dataframe2 contains dataframe2 包含

date          username        planned_cities 
2021-03-01    K John             Alabama
2021-03-02    K John             LA
2021-03-02    Ken Miles          Florida
2021-03-02    Ken Miles          California

Expected result (by considering only date username , dropping the columns that are not in df1)预期结果(仅考虑date username ,删除不在 df1 中的列)

date         username        planned_cities
2021-03-01    K John             Alabama
2021-03-02    Ken Miles          Florida
2021-03-02    Ken Miles          California

As 2021-03-02 K John is not in records of df1, it is dropped.由于2021-03-02 K John不在 df1 的记录中,因此将其删除。 How could I achieve this?我怎么能做到这一点?

you could use Index.isin with the columns you are interested in and then boolean index:您可以将Index.isin与您感兴趣的列一起使用,然后使用 boolean 索引:

cols = ['date','username']
idx1 = pd.MultiIndex.from_frame(df1[cols])
idx2 = pd.MultiIndex.from_frame(df2[cols])
out = df2[idx2.isin(idx1)]

       date   username planned_cities
  2021-03-01     K John        Alabama
  2021-03-02  Ken Miles        Florida
  2021-03-02  Ken Miles     California

Use an inner merge dropping duplicates that way you ensure you don't grow the left DataFrame.使用内部merge删除重复项,以确保不会增长左侧 DataFrame。

df2.merge(df1[['date', 'username']].drop_duplicates())

         date   username planned_cities
0  2021-03-01     K John        Alabama
1  2021-03-02  Ken Miles        Florida
2  2021-03-02  Ken Miles     California

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM