[英]Replacing the last row value of a specific column value
I have a dataframe df
which looks something like this:我有一个 dataframe
df
看起来像这样:
key![]() |
id ![]() |
---|---|
x ![]() |
0.6 ![]() |
x ![]() |
0.5 ![]() |
x ![]() |
0.43 ![]() |
x ![]() |
0.56 ![]() |
y![]() |
13 ![]() |
y![]() |
14 ![]() |
y![]() |
0.4 ![]() |
y![]() |
0.1 ![]() |
I'd like to replace the Last value for every key
value with 0, so that the df looks like this:我想用 0 替换每个
key
的最后一个值,这样 df 看起来像这样:
key![]() |
id ![]() |
---|---|
x ![]() |
0.6 ![]() |
x ![]() |
0.5 ![]() |
x ![]() |
0.43 ![]() |
x ![]() |
0 ![]() |
y![]() |
13 ![]() |
y![]() |
14 ![]() |
y![]() |
0.4 ![]() |
y![]() |
0 ![]() |
I've tried the following:我试过以下方法:
for i in df['key'].unique():
df.loc[df['key'] == i, 'id'].iat[-1] = 0
the problem is it does not replace the actual value in the df.问题是它不会替换 df 中的实际值。 What am I missing?
我错过了什么? and perhaps there's an even better (performing) way to tackle this problem.
也许有更好的(性能)方法来解决这个问题。
Use Series.duplicated
for get last value per key
and set 0
in DataFrame.loc
:使用
Series.duplicated
获取每个key
的最后一个值并在DataFrame.loc
中设置0
:
df.loc[~df['key'].duplicated(keep='last'), 'id'] = 0
print (df)
key id
0 x 0.60
1 x 0.50
2 x 0.43
3 x 0.00
4 y 13.00
5 y 14.00
6 y 0.40
7 y 0.00
How it working:它是如何工作的:
print (df.assign(mask=df['key'].duplicated(keep='last'),
invert_mask=~df['key'].duplicated(keep='last')))
key id mask invert_mask
0 x 0.60 True False
1 x 0.50 True False
2 x 0.43 True False
3 x 0.00 False True
4 y 13.00 True False
5 y 14.00 True False
6 y 0.40 True False
7 y 0.00 False True
Another solution is simply multiple id
column with boolean mask:另一种解决方案是使用 boolean 掩码的多个
id
列:
df['id'] = df['key'].duplicated(keep='last').mul(df['id'])
print (df)
key id
0 x 0.60
1 x 0.50
2 x 0.43
3 x 0.00
4 y 13.00
5 y 14.00
6 y 0.40
7 y 0.00
You can use groupby.cumcount
to access the nth row per group from the end (with ascending=False
), and boolean indexing :您可以使用
groupby.cumcount
从末尾访问每个组的第 n 行(使用ascending=False
)和boolean 索引:
df.loc[df.groupby('key').cumcount(ascending=False).eq(0), 'id'] = 0
output: output:
key id
0 x 0.60
1 x 0.50
2 x 0.43
3 x 0.00
4 y 13.00
5 y 14.00
6 y 0.40
7 y 0.00
Intermediate:中间的:
key id cumcount eq(0)
0 x 0.60 3 False
1 x 0.50 2 False
2 x 0.43 1 False
3 x 0.56 0 True
4 y 13.00 3 False
5 y 14.00 2 False
6 y 0.40 1 False
7 y 0.10 0 True
You can easily adapt to any row, example for the second to last row per group:您可以轻松适应任何行,例如每组倒数第二行:
df.loc[df.groupby('key').cumcount(ascending=False).eq(1), 'id'] = 0
For the third row per group:对于每组的第三行:
df.loc[df.groupby('key').cumcount().eq(2), 'id'] = 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.