[英]Python/Pandas: How do I replace specific values of a Pandas Data Frame based on individual id?
I have a long Pandas dataset that contains a column called 'id'
and another column called 'species'
, among other columns.我有一个很长的 Pandas 数据集,其中包含一个名为
'id'
列和另一个名为'species'
列,以及其他列。 I have to perform a change on the 'species'
column, based on specific values of the 'id'
column.我必须根据
'id'
列的特定值对'species'
列进行更改。
For example, if the 'id'
is '5555555'
(as a string), then I want that the 'species'
value change its current value 'dove'
(also a string) to 'hummingbird'
.例如,如果
'id'
是'5555555'
(作为字符串),那么我希望'species'
值将其当前值'dove'
(也是一个字符串)更改为'hummingbird'
。 So far I have been using the method:到目前为止,我一直在使用该方法:
df.loc[df["id"] == '5555555', "species"] = 'hummingbird'
Here is short sample data frame:这是简短的示例数据框:
import pandas as pd
#Starting dataset
d = {'id': ['11111111', '22222222', '33333333', '44444444', '55555555', '66666666', '77777777', '88888888'], 'species': ['dove', 'dove', 'dove', 'hummingbird', 'hummingbird', 'dove', 'hummingbird', 'dove']}
df = pd.DataFrame(data=d)
df
id species
0 11111111 dove
1 22222222 dove #wants to replace
2 33333333 dove #wants to replace
3 44444444 hummingbird
4 55555555 hummingbird
5 66666666 dove
6 77777777 hummingbird
7 88888888 dove #wants to replace
#Expected outcome
d = {'id': ['11111111', '22222222', '33333333', '44444444', '55555555', '66666666', '77777777', '88888888'], 'species': ['dove', 'hummingbird', 'hummingbird', 'hummingbird', 'hummingbird', 'dove', 'hummingbird', 'hummingbird']}
df = pd.DataFrame(data=d)
df
id species
0 11111111 dove
1 22222222 hummingbird #replaced
2 33333333 hummingbird #replaced
3 44444444 hummingbird
4 55555555 hummingbird
5 66666666 dove
6 77777777 hummingbird
7 88888888 hummingbird #replaced
This is ok for a small number of lines, but I have to do this to about 1000 lines with individual 'id'
each, so I thought that maybe a loop that I could feed it the list of 'id'
, but I honestly do not know how to even start.这对于少量行来说是可以的,但是我必须对大约 1000 行执行此操作,每个行都有单独的
'id'
,所以我认为这可能是一个循环,我可以将'id'
列表提供给它,但我老实说甚至不知道如何开始。
Thanks in advance!!提前致谢!!
and thanks to Scott Boston for pointing me out in the right direction to ask better questions!并感谢 Scott Boston 为我指出正确的方向以提出更好的问题!
Use isin
使用
isin
humming_ids = [44444444, 5555555, 88888888]
df.loc[df.id.isin(humming_ids), "species"] = 'hummingbird'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.