简体   繁体   中英

Create one hot column names based on column values in pandas

Apologies if something similar has been asked before, I searched around but couldn't figure out a solution.

I have a df like this:

df1 = pd.DataFrame({'Revenue':["This year,Last Year","This year",np.nan],
               'Cost':["This year,Last Year","This year",np.nan]})

在此处输入图像描述

and I'm trying to get it into a format like such, where each column results in two separate columns based on Last Year and This year

df2 = pd.DataFrame({'RevenueTY':[1,1,0],
                    'RevenueLY':[1,0,0],
                    'CostTY':[1,1,0],
                    'CostLY':[1,0,0]})

Any help is appreciated, thank you!

在此处输入图像描述

You can try get_dummies :

pd.concat([
  df1.Revenue.str.get_dummies(',').add_prefix('Revenue '), 
  df1.Cost.str.get_dummies(',').add_prefix('Cost ')
], axis=1)

#   Revenue Last Year  Revenue This year  Cost Last Year  Cost This year
#0                  1                  1               1               1
#1                  0                  1               0               1
#2                  0                  0               0               0

Or to be more programatic:

cols = ['Revenue', 'Cost']
pd.concat(
  [df1[x].str.get_dummies(',').add_prefix(x + ' ') for x in cols], 
  axis=1
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM