I have input data like as shown below. Here 'gender' and 'ethderived' are two columns. I would like to replace their values like 1,2,3 etc with categorical values. Ex - 1 with Male, 2 with Female
The mapping file looks like as shown below - sample 2 columns
Input data looks like as shown below
I expect my output dataframe to look like this
I have tried to do this using the below code. Though the code works fine, I don't see any replace happening. Can you please help me with this?
mapp = pd.read_csv('file2.csv')
data = pd.read_csv('file1.csv')
for col in mapp:
if col in data.columns:
print(col)
s = list(mapp.loc[(mapp[col].str.contains('^\d')==True)].index)
print("s is",s)
for i in s:
print("i is",i)
try:
value = mapp[col][i].split('. ')
print("value 0 is",value[0])
print("value 1 is",value[1])
if value[0] in data[col].values:
data.replace({col:{value[0]:value[1]}})
except:
print("column not present")
else:
print("No")
Please note that I have shown only two columns but in real time there might more than 600 columns. Any elegant approach/suggestions to make it simple is helpful. As I have two separate csv files, any suggestions on merge/join etc will also be helpful but please note that my mapping file contains values as "1. Male", "2. Female". hence I used regex
Also note that, several other column values can also have mapping values that start with 1. ex: 1. Single, 2. Married, 3. Divorced etc
Looking forward to your help
Use DataFrame.replace
with nested dictionaries - first key define colum name for replace and another values for replace created by function Series.str.extract
:
df = pd.DataFrame({'Gender':['1.Male','2.Female', np.nan],
'Ethnicity':['1.Chinese','2.Indian','3.Malay']})
print (df)
Gender Ethnicity
0 1.Male 1.Chinese
1 2.Female 2.Indian
2 NaN 3.Malay
d={x:df[x].str.extract(r'(\d+)\.(.+)').dropna().set_index(0)[1].to_dict() for x in df.columns}
print (d)
{'Gender': {'1': 'Male', '2': 'Female'},
'Ethnicity': {'1': 'Chinese', '2': 'Indian', '3': 'Malay'}}
df1 = pd.DataFrame({'Gender':[2,1,2,1],
'Ethnicity':[1,2,3,1]})
print (df1)
Gender Ethnicity
0 2 1
1 1 2
2 2 3
3 1 1
#convert to strings before replace
df2 = df1.astype(str).replace(d)
print (df2)
Gender Ethnicity
0 Female Chinese
1 Male Indian
2 Female Malay
3 Male Chinese
If the entries are always in order( 1.XXX,2.XXX...
), use:
m=df1.apply(lambda x: x.str[2:])
n=df2.sub(1).replace(m)
print(n)
gender ethderived
0 Female Chinese
1 Male Indian
2 Male Malay
3 Female Chinese
4 Male Chinese
5 Female Indian
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.