![](/img/trans.png)
[英]How to drop duplicates in a data frame and keep first with two exceptions?
[英]Keep unique values of a column in a data frame WITHOUT using drop duplicates
我有一個數據框:
excel1 user_id public_key first_seen
0 Mark key1 1/14/2015 11:51:41 PM
1 Mark key2 1/14/2015 11:51:41 PM
2 Mark key3 1/14/2015 11:51:41 PM
3 Rhonda key4 2/16/2015 2:16:04 PM
4 Rhonda key5 2/16/2015 2:16:04 PM
5 Rhonda key6 2/16/2015 2:16:04 PM
我想first_seen
行但刪除first_seen
列中的重復條目
excel1 user_id public_key first_seen
0 Mark key1 1/14/2015 11:51:41 PM
1 Mark key2
2 Mark key3
3 Rhonda key4 2/16/2015 2:16:04 PM
4 Rhonda key5
5 Rhonda key6
這是因為我正在對兩個 csv 文件執行 pd.merge:
merged_df = pd.merge(output_df, read_df, left_on="user_id", right_on="user_id_left", how="inner").drop_duplicates(
subset=['body'], keep='first')
我在最終數據幀上嘗試了 .filter() 和 .query() 方法,但無法獲得所需的結果。 如何獲得所需的 df?
IIUC,您可以使用drop_duplicates
:
df['first_seen'] = df.drop_duplicates(['user_id', 'first_seen'])['first_seen']
輸出:
excel1 user_id public_key first_seen
0 0 Mark key1 1/14/2015 11:51:41 PM
1 1 Mark key2 NaN
2 2 Mark key3 NaN
3 3 Rhonda key4 2/16/2015 2:16:04 PM
4 4 Rhonda key5 NaN
5 5 Rhonda key6 NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.