[英]Pandas df change the value of a row in one column based on a value in a dictionary matching a row in a different column
I have a df pictured below and I want to change the value of the "sic_code" depending on the "code"我有一个如下图所示的 df,我想根据“代码”更改“sic_code”的值
I have created a dictionary:我创建了一个字典:
comp_dict = dict(zip(sic_dict_keys, sic_dict_values))
and was thinking of something like this but then got stuck.并在想这样的事情,但后来卡住了。 I basically want to change the value of sic_code if the code number is in my dictionary eg changing the sic_code 2834 to 3000 for the code 1611787
如果代码在我的字典中,我基本上想更改 sic_code 的值,例如将代码 1611787 的 sic_code 2834 更改为 3000
for key in comp_dict:
if df.loc[df["code"] == key]:
Pandas DataFrame
s have a replace
method for exactly this operation: Pandas
DataFrame
s 有一个replace
方法来完成这个操作:
import pandas as pd
df = pd.DataFrame(data={'a': [1, 2, 3], 'b': [100, 200, 300]})
rename_dict = {100: 1000, 200: 2000}
df['b'].replace(rename_dict, inplace=True)
print(df)
which results in:这导致:
a b
0 1 1000
1 2 2000
2 3 300
You can leave out the inplace=True
if you prefer to return a copy.如果您希望返回副本,则可以省略
inplace=True
。
I'm not sure what you want to do, I wrote the code so that if a row has a sic_code corresponding to a key of comp_dict, the sic_code is changed to the corresponding value of comp_dict.我不确定你想做什么,我写了代码,如果一行有一个 sic_code 对应于 comp_dict 的键,则 sic_code 更改为 comp_dict 的对应值。 Tell me if I got you wrong.
告诉我我是否弄错了。 For the loop, I'd rather iterate the dataframe rather than the dictionnary.
对于循环,我宁愿迭代 dataframe 而不是字典。 For instance, by using iterrows and iloc , a simple loop could look like this:
例如,通过使用iterrows和iloc ,一个简单的循环可能如下所示:
for index, row in df.iterrows():
sic_code = row['sic_code']
if sic_code in comp_dict.keys():
df.iloc[index, <index of column sic_code>] = comp_dict[sic_code]
Here, if your dict contains {2834: 3000}, all the rows with a sic_code value 2834 will be changed to 30000.在这里,如果您的 dict 包含 {2834: 3000},则所有 sic_code 值为 2834 的行都将更改为 30000。
Here is a two-step approach to the question.这是解决问题的两步方法。 First, find the data frame entries that exist in the code-to-sic code dictionary.
首先,找到存在于 code-to-sic 代码字典中的数据框条目。 Second, use the
.map()
function to update the sic code:其次,使用
.map()
function 更新 sic 代码:
df = (pd.DataFrame(
{'code': [1611787, 170846, 142529],
'name': ['Advanced', 'Perth', 'ATA Creativity'],
'sic_code': [2834, 6221, 8200]})
.set_index('code')
)
# key is `code`; value is `sic_code`
comp_dict = {1611787: 3000}
# find data frame entries such that `code` is in the dictionary
mask = df.index.isin(comp_dict)
# update `sic_code`
df.loc[mask, 'sic_code'] = df.index[mask].map(comp_dict)
df
The resulting data frame is:生成的数据框是:
name sic_code
code
1611787 Advanced 3000
170846 Perth 6221
142529 ATA Creativity 8200
I got this to work how I want it to but I don't know if its the most efficient.我让它按我想要的方式工作,但我不知道它是否最有效。
for index, row in df.iterrows():
print(row['code'], row['sic_code'])
for key in comp_dict:
# print(key)
if row['code'] == key:
df['sic_code'][index] = comp_dict[key]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.