简体   繁体   English

我的数据中带有 NaN 的 Pandas nlargest 返回超过 n 行数据

[英]Pandas nlargest with NaN inside my data return more than n rows of data

I have DataFrame looks like this:我的DataFrame看起来像这样:

  Name  Score1  Score2  Score3
0    A      98      72      99
1    A      98      84      91
2    B      34      20      81
3    A      98      93      88
4    B      68      97      12
5    A     NaN      72     NaN

I want to groupby name, then take top 2 on Score1 , if duplicate values, then look at Score2 whichever larger.我想按名称Score1 groupby取前 2 名,如果重复值,则查看Score2中较大的一个。 Expectation:期待:

  Name  Score1  Score2  Score3
0    A      98      93      88
1    A      98      84      91
2    B      68      97      12
3    B      34      20      81

i tried df.groupby("Name").apply(lambda x:x.nlargest(2, ["Score1", "Score2"])).reset_index(drop=True) .我试过df.groupby("Name").apply(lambda x:x.nlargest(2, ["Score1", "Score2"])).reset_index(drop=True) What i get is:我得到的是:

  Name  Score1  Score2  Score3
0    A     98      93      88
1    A     98      84      91
2    A     98      72      99
3    A    NaN      72     NaN
4    B     68      97      12
5    B     34      20      81

I found that because of NaN , it returns me more than 2 rows of data for Name A .我发现由于NaN ,它为Name A返回了超过 2 行的数据。 Is dropna the only way to fix it? dropna是修复它的唯一方法吗?

You can also do it like this:你也可以这样做:

out = df.sort_values(['Score1', 'Score2'], ascending=False).groupby('Name').head(2)
print(out)
  Name  Score1  Score2  Score3
3    A    98.0      93    88.0
1    A    98.0      84    91.0
4    B    68.0      97    12.0
2    B    34.0      20    81.0

You can try fill the nan or drop them before using nlargest.您可以在使用 nlargest 之前尝试填充 nan 或删除它们。

cols = ["Score1", "Score2"]

df[cols] = df[cols].fillna()
#df = df.dropna(subset=columns)

out = df.groupby("Name").apply(lambda g: g.nlargest(2, cols)).reset_index(drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM