[英]drop records from a df that are not in another df using python
I've a sample datafram1我有一个示例数据帧1
date username cities
2021-03-01 K John New york
2021-03-01 K John LA
2021-03-02 Ken Miles Florida
2021-03-02 Ken Miles LA
dataframe2 contains dataframe2 包含
date username planned_cities
2021-03-01 K John Alabama
2021-03-02 K John LA
2021-03-02 Ken Miles Florida
2021-03-02 Ken Miles California
Expected result (by considering only date username
, dropping the columns that are not in df1)预期结果(仅考虑date username
,删除不在 df1 中的列)
date username planned_cities
2021-03-01 K John Alabama
2021-03-02 Ken Miles Florida
2021-03-02 Ken Miles California
As 2021-03-02 K John
is not in records of df1, it is dropped.由于2021-03-02 K John
不在 df1 的记录中,因此将其删除。 How could I achieve this?我怎么能做到这一点?
you could use Index.isin
with the columns you are interested in and then boolean index:您可以将Index.isin
与您感兴趣的列一起使用,然后使用 boolean 索引:
cols = ['date','username']
idx1 = pd.MultiIndex.from_frame(df1[cols])
idx2 = pd.MultiIndex.from_frame(df2[cols])
out = df2[idx2.isin(idx1)]
date username planned_cities
2021-03-01 K John Alabama
2021-03-02 Ken Miles Florida
2021-03-02 Ken Miles California
Use an inner merge
dropping duplicates that way you ensure you don't grow the left DataFrame.使用内部merge
删除重复项,以确保不会增长左侧 DataFrame。
df2.merge(df1[['date', 'username']].drop_duplicates())
date username planned_cities
0 2021-03-01 K John Alabama
1 2021-03-02 Ken Miles Florida
2 2021-03-02 Ken Miles California
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.