简体   繁体   English

熊猫:如何按列选择第一个或最后一个与 drop_duplicates 保持一致

[英]pandas: how to select first or last by column in keep with drop_duplicates

As shown below, name must be keep in fisrt and team in last .如下图,name 必须放在fisrt中, team 放在last中。

How can I accomplish this with .drop_duplicates() or otherwise?如何使用.drop_duplicates()或其他方式完成此操作?

   name  team ...
0  john  a    ...
1  mike  b    ...
2  john  c

↓

   name  team ...
0  john  c    ...
1  mike  b    ...

-- Additional note about comments -- -- 关于评论的补充说明 --

.groupby('name').agg({'team': 'last', 'country': 'first'})

The way it works now, if the first line of country is Nan If the first line of country is Nan, a value that is not the first will be obtained as follows.现在的工作方式,如果country的第一行是Nan如果 country 的第一行是 Nan,那么会得到一个不是first一个的值,如下所示。

Is this because the case of Nan is ignored?这是因为Nan的案子被忽略了吗? Even if first is specified and first is Nan , Nan must still be retained.即使指定了first first NanNan仍然必须保留。

   name  team  country ...
0  john   a    Nan     ...
1  mike  b     Brazil  ...
2  john  c     Canada  ...

↓

   name  team  country ...
0  john  c     Canada  ...
1  mike  b     Brazil  ...

You can use the .groupby() function:您可以使用.groupby()函数:

df.groupby('name').agg({'team': 'last'}) . df.groupby('name').agg({'team': 'last'})

Be aware that in the value that's returned per name is dependent on the sorting of your dataframe.请注意,每个名称返回的值取决于数据框的排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM