替换特定列值的最后一行值

Question

I have a dataframe df which looks something like this:我有一个 dataframe df看起来像这样：

key钥匙	id ID
x X	0.6 0.6
x X	0.5 0.5
x X	0.43 0.43
x X	0.56 0.56
y是	13 13
y是	14 14
y是	0.4 0.4
y是	0.1 0.1

I'd like to replace the Last value for every key value with 0, so that the df looks like this:我想用 0 替换每个key的最后一个值，这样 df 看起来像这样：

key钥匙	id ID
x X	0.6 0.6
x X	0.5 0.5
x X	0.43 0.43
x X	0 0
y是	13 13
y是	14 14
y是	0.4 0.4
y是	0 0

I've tried the following:我试过以下方法：

for i in df['key'].unique():
   df.loc[df['key'] == i, 'id'].iat[-1] = 0

the problem is it does not replace the actual value in the df.问题是它不会替换 df 中的实际值。 What am I missing?我错过了什么？ and perhaps there's an even better (performing) way to tackle this problem.也许有更好的（性能）方法来解决这个问题。

Answer 1

Use Series.duplicated for get last value per key and set 0 in DataFrame.loc :使用Series.duplicated获取每个key的最后一个值并在DataFrame.loc中设置0 ：

df.loc[~df['key'].duplicated(keep='last'), 'id'] = 0

print (df)
  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

How it working:它是如何工作的：

print (df.assign(mask=df['key'].duplicated(keep='last'),
                 invert_mask=~df['key'].duplicated(keep='last')))
  key     id   mask  invert_mask
0   x   0.60   True        False
1   x   0.50   True        False
2   x   0.43   True        False
3   x   0.00  False         True
4   y  13.00   True        False
5   y  14.00   True        False
6   y   0.40   True        False
7   y   0.00  False         True

Another solution is simply multiple id column with boolean mask:另一种解决方案是使用 boolean 掩码的多个id列：

df['id'] = df['key'].duplicated(keep='last').mul(df['id'])
print (df)
  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

Answer 2

You can use groupby.cumcount to access the nth row per group from the end (with ascending=False ), and boolean indexing :您可以使用groupby.cumcount从末尾访问每个组的第 n 行（使用ascending=False ）和boolean 索引：

df.loc[df.groupby('key').cumcount(ascending=False).eq(0), 'id'] = 0

output: output：

  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

Intermediate:中间的：

  key     id  cumcount  eq(0)
0   x   0.60         3  False
1   x   0.50         2  False
2   x   0.43         1  False
3   x   0.56         0   True
4   y  13.00         3  False
5   y  14.00         2  False
6   y   0.40         1  False
7   y   0.10         0   True

You can easily adapt to any row, example for the second to last row per group:您可以轻松适应任何行，例如每组倒数第二行：

df.loc[df.groupby('key').cumcount(ascending=False).eq(1), 'id'] = 0

For the third row per group:对于每组的第三行：

df.loc[df.groupby('key').cumcount().eq(2), 'id'] = 0

替换特定列值的最后一行值

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-11-22 12:02:04

解决方案2
2 2022-11-22 12:03:09

替换特定列值的最后一行值

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-11-22 12:02:04

解决方案2 2 2022-11-22 12:03:09

解决方案1
3 已采纳 2022-11-22 12:02:04

解决方案2
2 2022-11-22 12:03:09