[英]Recoding multiple integer variables into one in python
each record represents a person.每条记录代表一个人。 250.000 is diabetes, and I would like to make a DXDiabetes column if 250 appears in any of Code1, Code2, or Code3.
250.000 是糖尿病,如果 250 出现在 Code1、Code2 或 Code3 中的任何一个中,我想创建一个 DXDiabetes 列。
import pandas as pd
data_prep = pd.DataFrame({"Code1" : [250.000,276.000,401.000,414.000],
"Code2" : [403.000,411.000,414.000,250.000],
"Code3" : [427.000,250.000,486.000,682.000]})
data_prep
However, I'm not keeping the "1" coding from Code1 as I move to Code3.但是,当我移至 Code3 时,我不会保留 Code1 中的“1”编码。 DXDiabetes is only keeping the last recode.
DXDiabetes 只保留最后一次重新编码。
data_prep['DXDiabetes']=data_prep['Code1'].apply(lambda x: 1 if round(x,0) == 250 else 0)
data_prep['DXDiabetes']=data_prep['Code2'].apply(lambda x: 1 if round(x,0) == 250 else None)
data_prep['DXDiabetes']=data_prep['Code3'].apply(lambda x: 1 if round(x,0) == 250 else None)
print(data_prep['DXDiabetes'].value_counts())
Is there a way to have DXDiabetes = 1 if any of Code1, Code2, or Code3 == 250?如果 Code1、Code2 或 Code3 中的任何一个 == 250,有没有办法让 DXDiabetes = 1?
Many thanks,非常感谢,
Sandra桑德拉
You can use np.where
, assigning a value of 1 if the condition is True
and 0 if it is False
.您可以使用
np.where
,如果条件为True
则赋值为 1 ,如果为False
则赋值为 0 。 The condition checks if any of the rows for the three columns equals 250.该条件检查三列的任何行是否等于 250。
import numpy as np
data_prep['DXDiabetes'] = np.where(
data_prep[['Code1', 'Code2', 'Code3']].eq(250).any(axis=1), 1, 0)
>>> data_prep
Code1 Code2 Code3 DXDiabetes
0 250.0 403.0 427.0 1
1 276.0 411.0 250.0 1
2 401.0 414.0 486.0 0
3 414.0 250.0 682.0 1
Note that you first check for equality:请注意,您首先检查是否相等:
>>>> data_prep[['Code1', 'Code2', 'Code3']].eq(250)
Code1 Code2 Code3
0 True False False
1 False False True
2 False False False
3 False True False
And then you check if any row above is True
by specifying .any(axis=1)
.然后通过指定
.any(axis=1)
检查上面的任何行是否为True
。
>>> data_prep[['Code1', 'Code2', 'Code3']].eq(250).any(axis=1)
0 True
1 True
2 False
3 True
dtype: bool
The following should work:以下应该有效:
data_prep['DXDiabetes']=data_prep.apply(lambda x: 1 if any(i==250 for i in x) else 0, axis=1)
>>> print(data_prep)
Code1 Code2 Code3 DXDiabetes
0 250.0 403.0 427.0 1
1 276.0 411.0 250.0 1
2 401.0 414.0 486.0 0
3 414.0 250.0 682.0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.