There is the get_dummies transformation in the pandas package in python, which transforms categorical variables to binary (flag) variables with values 0 / 1. This transformation is based on the actual values, but I'd like to store the code of the transformation, so that I can run it on other datasets, with less values, and still get the full-sized transformed data structure.
Say you have this code:
import pandas as pd
a = [[5,12,"blue"], [8,53,"yellow"]]
df = pd.DataFrame(a, columns=['Weight','Size','Color'])
df.apply(pd.to_numeric, errors='ignore')
df
Producing this data:
Weight Size Color
5 12 blue
8 53 yellow
and:
df = pd.get_dummies(df)
df
produces this:
Weight Size Color_blue Color_yellow
5 12 1 0
8 53 0 1
I'd like to store this original transformation, so that if I get a record later, like:
[2,9,"blue"]
I can still get the whole structure, like:
Weight Size Color_blue Color_yellow
2 9 1 0
Get_dummies will omit the Color_yellow column in the latter case...
What is the simplest solution to it?
I was thinking of something like building my own get_dummies function, which goes through all the categorical variables, gets all their possible distinct values, and then produces the code of the python function, which does the transformation. But there must be some already implemented solution to it...
This is what I was looking for. The code prints the transformations, which has to be done on later datasets:
import pandas as pd
import numpy as np
a = [[5,12,"blue","apple"], [8,53,"yellow","pear"], [1,8,"brown","peach"],[1,2,"blue","plum"]]
df = pd.DataFrame(a, columns=['Weight','Size','Color','Fruit'])
df.apply(pd.to_numeric, errors='ignore')
for col in df.select_dtypes(include=["object"]).columns:
for i in df[col].unique():
df[col+"_"+i] = np.where(df[col] == i, 1, 0)
print('df["'+col+'_'+i+'"] = np.where(df["'+col+'"] == "'+i+'", 1, 0)')
df = df.drop(columns=[col])
print('df = df.drop(columns=["'+col+'"])')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.