具有相同/相同列名的熊猫 get_dummies

Question

I have我有

In [122]: d=pandas.DataFrame({'d_1':['a','x'],'d_2':['x','y']})

In [123]: d
Out[123]: 
  d_1 d_2
0   a   x
1   x   y

I want:我想要：

    a   x   y 
0   1   1   0 
1   0   1   1

I do not want to use我不想使用

In [139]: pandas.get_dummies(d)
Out[139]: 
   d_1_a  d_1_x  d_2_x  d_2_y
0    1.0    0.0    1.0    0.0
1    0.0    1.0    0.0    1.0

Because d_1_x and d_2_x are considered distinct by this function, which requires too much memory for my application.因为此函数认为 d_1_x 和 d_2_x 是不同的，这对我的应用程序来说需要太多内存。

I do however want to use get_dummies because it is fast;但是我确实想使用 get_dummies 因为它很快； so, I tried to rename the columns and apply get_dummies所以，我尝试重命名列并应用 get_dummies

In [124]: d.columns=['d' for el in d.columns]

In [141]: d
Out[141]: 
   d  d
0  a  x
1  x  y

In [151]: pandas.get_dummies(d)
Out[151]: 
   d_('d',)  d_('d',)
0       1.0       1.0
1       1.0       1.0

Answer 1

You can try something like this:你可以尝试这样的事情：

import pandas as pd
d.apply(lambda x: pd.Series(1, x), 1).fillna(0)

#     a   x   y
#0  1.0 1.0 0.0
#1  0.0 1.0 1.0

具有相同/相同列名的熊猫 get_dummies

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-11-16 16:09:27

具有相同/相同列名的熊猫 get_dummies

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-11-16 16:09:27

解决方案1
1 已采纳 2016-11-16 16:09:27