需要帮助来创建伪虚拟变量，该变量使用另一列中的值代替“ 1”

Question

I have a dataframe that looks like this : 我有一个看起来像这样的数据框：

A     B    C

34    x    a
3     y    b
23    y    a
40    x    b

Essentially, cols B and C need to become dummy variables, with headers B_x, B_y, C_a, C_b. 本质上，列B和列C必须成为具有标题B_x，B_y，C_a，C_b的伪变量。 The function is almost exactly how get_dummies() works in pandas, with one major difference: I need the value to be the value in column A for all dummy variables created where the value would be 1. Something like 该函数几乎与get_dummies（）在熊猫中的工作方式完全相同，但有一个主要区别：对于所有创建的虚拟变量（其中的值为1），我需要将该值设为A列中的值。

A     B_x   B_y  C_a C_b

34    34    0    34  0
3     0     3    0   3
23    0     23   23  0
40    40    0    0   40

I'm working with fairly large data with a high number of categories. 我正在处理具有大量类别的相当大的数据。

I've tried using get_dummies() on the dataset and then df.mask to change all 1's to df.A, however this is atrociously slow (about 10min). 我尝试在数据集上使用get_dummies（），然后使用df.mask将全1更改为df.A，但是这非常慢（大约10分钟）。

Answer 1

Use pd.get_dummies and broadcast column A 使用pd.get_dummies和广播列A

df2 = pd.get_dummies(df[['B', 'C']]) * df.A.values.reshape([-1,1])

    B_x B_y C_a C_b
0   34  0   34  0
1   0   3   0   3
2   0   23  23  0
3   40  0   0   40

To assign back A , there are Many alternatives. 要分配回A ，有很多选择。 Can do df2['A'] = df['A'] or use pd.concat 可以做df2['A'] = df['A']或使用pd.concat

pd.concat([df.A, df2], axis=1)

需要帮助来创建伪虚拟变量，该变量使用另一列中的值代替“ 1”

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-03-27 02:05:03

需要帮助来创建伪虚拟变量，该变量使用另一列中的值代替“ 1”

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-03-27 02:05:03

解决方案1
1 已采纳 2019-03-27 02:05:03