简体   繁体   English

替换特定列值的最后一行值

[英]Replacing the last row value of a specific column value

I have a dataframe df which looks something like this:我有一个 dataframe df看起来像这样:

key钥匙 id ID
x X 0.6 0.6
x X 0.5 0.5
x X 0.43 0.43
x X 0.56 0.56
y 13 13
y 14 14
y 0.4 0.4
y 0.1 0.1

I'd like to replace the Last value for every key value with 0, so that the df looks like this:我想用 0 替换每个key的最后一个值,这样 df 看起来像这样:

key钥匙 id ID
x X 0.6 0.6
x X 0.5 0.5
x X 0.43 0.43
x X 0 0
y 13 13
y 14 14
y 0.4 0.4
y 0 0

I've tried the following:我试过以下方法:

for i in df['key'].unique():
   df.loc[df['key'] == i, 'id'].iat[-1] = 0

the problem is it does not replace the actual value in the df.问题是它不会替换 df 中的实际值。 What am I missing?我错过了什么? and perhaps there's an even better (performing) way to tackle this problem.也许有更好的(性能)方法来解决这个问题。

Use Series.duplicated for get last value per key and set 0 in DataFrame.loc :使用Series.duplicated获取每个key的最后一个值并在DataFrame.loc中设置0

df.loc[~df['key'].duplicated(keep='last'), 'id'] = 0

print (df)
  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

How it working:它是如何工作的:

print (df.assign(mask=df['key'].duplicated(keep='last'),
                 invert_mask=~df['key'].duplicated(keep='last')))
  key     id   mask  invert_mask
0   x   0.60   True        False
1   x   0.50   True        False
2   x   0.43   True        False
3   x   0.00  False         True
4   y  13.00   True        False
5   y  14.00   True        False
6   y   0.40   True        False
7   y   0.00  False         True

Another solution is simply multiple id column with boolean mask:另一种解决方案是使用 boolean 掩码的多个id列:

df['id'] = df['key'].duplicated(keep='last').mul(df['id'])
print (df)
  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

You can use groupby.cumcount to access the nth row per group from the end (with ascending=False ), and boolean indexing :您可以使用groupby.cumcount从末尾访问每个组的第 n 行(使用ascending=False )和boolean 索引

df.loc[df.groupby('key').cumcount(ascending=False).eq(0), 'id'] = 0

output: output:

  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

Intermediate:中间的:

  key     id  cumcount  eq(0)
0   x   0.60         3  False
1   x   0.50         2  False
2   x   0.43         1  False
3   x   0.56         0   True
4   y  13.00         3  False
5   y  14.00         2  False
6   y   0.40         1  False
7   y   0.10         0   True

You can easily adapt to any row, example for the second to last row per group:您可以轻松适应任何行,例如每组倒数第二行:

df.loc[df.groupby('key').cumcount(ascending=False).eq(1), 'id'] = 0

For the third row per group:对于每组的第三行:

df.loc[df.groupby('key').cumcount().eq(2), 'id'] = 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM