![](/img/trans.png)
[英]How to drop duplicates in a data frame and keep first with two exceptions?
[英]Keep unique values of a column in a data frame WITHOUT using drop duplicates
我有一个数据框:
excel1 user_id public_key first_seen
0 Mark key1 1/14/2015 11:51:41 PM
1 Mark key2 1/14/2015 11:51:41 PM
2 Mark key3 1/14/2015 11:51:41 PM
3 Rhonda key4 2/16/2015 2:16:04 PM
4 Rhonda key5 2/16/2015 2:16:04 PM
5 Rhonda key6 2/16/2015 2:16:04 PM
我想first_seen
行但删除first_seen
列中的重复条目
excel1 user_id public_key first_seen
0 Mark key1 1/14/2015 11:51:41 PM
1 Mark key2
2 Mark key3
3 Rhonda key4 2/16/2015 2:16:04 PM
4 Rhonda key5
5 Rhonda key6
这是因为我正在对两个 csv 文件执行 pd.merge:
merged_df = pd.merge(output_df, read_df, left_on="user_id", right_on="user_id_left", how="inner").drop_duplicates(
subset=['body'], keep='first')
我在最终数据帧上尝试了 .filter() 和 .query() 方法,但无法获得所需的结果。 如何获得所需的 df?
IIUC,您可以使用drop_duplicates
:
df['first_seen'] = df.drop_duplicates(['user_id', 'first_seen'])['first_seen']
输出:
excel1 user_id public_key first_seen
0 0 Mark key1 1/14/2015 11:51:41 PM
1 1 Mark key2 NaN
2 2 Mark key3 NaN
3 3 Rhonda key4 2/16/2015 2:16:04 PM
4 4 Rhonda key5 NaN
5 5 Rhonda key6 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.