简体   繁体   English

Python pandas:来自 get_dummies 的动态连接

[英]Python pandas: dynamic concatenation from get_dummies

having the following dataframe:具有以下数据框:

import pandas as pd

cars = ["BMV", "Mercedes", "Audi"]
customer = ["Juan", "Pepe", "Luis"]
price = [100, 200, 300]
year = [2022, 2021, 2020]


df_raw = pd.DataFrame(list(zip(cars, customer, price, year)),\
                      columns=["cars", "customer", "price", 'year'])

I need to do one-hot encoding from the categorical variables cars and customer , for this I use the get_dummies method for these two columns.我需要对分类变量carscustomer进行一次热编码,为此我对这两列使用 get_dummies 方法。

numerical = ["price", "year"]
df_final = pd.concat([df_raw[numerical], pd.get_dummies(df_raw.cars),\
                      pd.get_dummies(df_raw.customer)], axis=1)

Is there a way to generate these dummies in a dynamic way, like putting them in a list and loop through them with a for.In this case it may seem simple because I only have 2, but if I had 30 or 60 attributes, would I have to go one by one?有没有办法以动态方式生成这些虚拟对象,例如将它们放在一个列表中并使用 for 循环遍历它们。在这种情况下,它可能看起来很简单,因为我只有 2 个,但如果我有 30 或 60 个属性,会我要一个一个去吗?

pd.get_dummies

pd.get_dummies(df_raw, columns=['cars', 'customer'])

   price  year  cars_Audi  cars_BMV  cars_Mercedes  customer_Juan  customer_Luis  customer_Pepe
0    100  2022          0         1              0              1              0              0
1    200  2021          0         0              1              0              0              1
2    300  2020          1         0              0              0              1              0

One simple way is to concatenate the columns and use str.get_dummies :一种简单的方法是连接列并使用str.get_dummies

cols = ['cars', 'customer']
out = df_raw.join(df_raw[cols].agg('|'.join, axis=1).str.get_dummies())

output:输出:

       cars customer  price  year  Audi  BMV  Juan  Luis  Mercedes  Pepe
0       BMV     Juan    100  2022     0    1     1     0         0     0
1  Mercedes     Pepe    200  2021     0    0     0     0         1     1
2      Audi     Luis    300  2020     1    0     0     1         0     0

Another option is to melt and use crosstab :另一种选择是melt并使用crosstab

df2 = df_raw[cols].reset_index().melt('index')
out = df_raw.join(pd.crosstab(df2['index'], df2['value']))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM