[英]pandas: how to select first or last by column in keep with drop_duplicates
As shown below, name must be keep in fisrt
and team in last
.如下图,name 必须放在fisrt
中, team 放在last
中。
How can I accomplish this with .drop_duplicates()
or otherwise?如何使用.drop_duplicates()
或其他方式完成此操作?
name team ...
0 john a ...
1 mike b ...
2 john c
↓
name team ...
0 john c ...
1 mike b ...
-- Additional note about comments -- -- 关于评论的补充说明 --
.groupby('name').agg({'team': 'last', 'country': 'first'})
The way it works now, if the first line of country
is Nan
If the first line of country is Nan, a value that is not the first
will be obtained as follows.现在的工作方式,如果country
的第一行是Nan
如果 country 的第一行是 Nan,那么会得到一个不是first
一个的值,如下所示。
Is this because the case of Nan
is ignored?这是因为Nan
的案子被忽略了吗? Even if first
is specified and first
is Nan
, Nan
must still be retained.即使指定了first
first
Nan
, Nan
仍然必须保留。
name team country ...
0 john a Nan ...
1 mike b Brazil ...
2 john c Canada ...
↓
name team country ...
0 john c Canada ...
1 mike b Brazil ...
You can use the .groupby()
function:您可以使用.groupby()
函数:
df.groupby('name').agg({'team': 'last'})
. df.groupby('name').agg({'team': 'last'})
。
Be aware that in the value that's returned per name is dependent on the sorting of your dataframe.请注意,每个名称返回的值取决于数据框的排序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.