如何从 pandas dataframe 反转虚拟变量

Question

I would like to reverse a dataframe with dummy variables.我想用虚拟变量反转 dataframe。 For example,例如，

from df_input:来自 df_input：

Course_01 Course_02 Course_03 
  0           0         1 
  1           0         0 
  0           1         0

To df_output到 df_output

   Course
0 03
1 01
2 02

I have been looking at the solution provided at Reconstruct a categorical variable from dummies in pandas but it did not work.我一直在查看Reconstruct a categorical variable from dummies in pandas中提供的解决方案，但它没有用。 Please, Any help would be much appreciated.请，任何帮助将不胜感激。

Many Thanks, Best Regards, Carlo非常感谢，最好的问候，卡罗

Answer 1

We can use wide_to_long , then select rows that are not equal to zero ie我们可以使用wide_to_long ，然后选择不等于零的行，即

ndf = pd.wide_to_long(df, stubnames='T_', i='id',j='T')

      T_
id  T     
id1 30   0
id2 30   1
id1 40   1
id2 40   0

not_dummy = ndf[ndf['T_'].ne(0)].reset_index().drop('T_',1)

   id   T
0  id2  30
1  id1  40

Update based on your edit :根据您的编辑更新：

ndf = pd.wide_to_long(df.reset_index(), stubnames='T_',i='index',j='T')

not_dummy = ndf[ndf['T_'].ne(0)].reset_index(level='T').drop('T_',1)

        T
index    
1      30
0      40

Answer 2

You can use:您可以使用：

#create id to index if necessary
df = df.set_index('id')
#create MultiIndex
df.columns = df.columns.str.split('_', expand=True)
#reshape by stack and remove 0 rows
df = df.stack().reset_index().query('T != 0').drop('T',1).rename(columns={'level_1':'T'})
print (df)
    id   T
1  id1  40
2  id2  30

EDIT:编辑：

col_name = 'Course' 
df.columns = df.columns.str.split('_', expand=True)
df = (df.replace(0, np.nan)
        .stack()
        .reset_index()

        .drop([col_name, 'level_0'],1)
        .rename(columns={'level_1':col_name})
)
print (df)
  Course
0     03
1     01
2     02

Answer 3

Suppose you have the following dummy DF:假设您有以下虚拟 DF：

In [152]: d
Out[152]:
    id  T_30  T_40  T_50
0  id1     0     1     1
1  id2     1     0     1

we can prepare the following helper Series:我们可以准备以下助手系列：

    In [153]: v = pd.Series(d.columns.drop('id').str.replace(r'\D','').astype(int), index=d.columns.drop('id'))

In [155]: v
Out[155]:
T_30    30
T_40    40
T_50    50
dtype: int64

now we can multiply them, stack and filter:现在我们可以将它们相乘、堆叠和过滤：

In [154]: d.set_index('id').mul(v).stack().reset_index(name='T').drop('level_1',1).query("T > 0")
Out[154]:
    id   T
1  id1  40
2  id1  50
3  id2  30
5  id2  50

Answer 4

I think melt() was pretty much made for this?我认为melt()几乎就是为此而生的？

Your data, I think:你的数据，我认为：

df_input = pd.DataFrame.from_dict({'Course_01':[0,1,0],
                               'Course_02':[0,0,1],
                               'Course_03':[1,0,0]})

Change names to match your desired output:更改名称以匹配您想要的输出：

df_input.columns = df_input.columns.str.replace('Course_','')

Melt the dataframe:融化数据框：

dataMelted = pd.melt(df_input,  
                    var_name='Course', 
                    ignore_index=False)

Clean up zeros, etc:清理零等：

df_output = (dataMelted[dataMelted['value'] != 0]
            .drop('value', axis=1)
            .sort_index())

>>> df_output
  Course
0     03
1     01
2     02

Answer 5

#Create a new column for the categorical

df['categ']=0
for i in range(df):
    if df['Course01']==1:
        df['categ']='01'
    if df['Course02']==1:
        df['categ']='02'
    if df['Course03']==1:
        df['categ']='03'
df.categ.astype('category']

如何从 pandas dataframe 反转虚拟变量

问题描述

5 个解决方案

解决方案1
5 已采纳 2017-12-07 12:02:46

解决方案2
3 2017-12-07 12:15:55

解决方案3
2 2017-12-07 11:57:33

解决方案4
0 2021-01-15 16:44:01

解决方案5
0 2022-10-03 01:22:44

如何从 pandas dataframe 反转虚拟变量

问题描述

5 个解决方案

解决方案1 5 已采纳 2017-12-07 12:02:46

解决方案2 3 2017-12-07 12:15:55

解决方案3 2 2017-12-07 11:57:33

解决方案4 0 2021-01-15 16:44:01

解决方案5 0 2022-10-03 01:22:44

解决方案1
5 已采纳 2017-12-07 12:02:46

解决方案2
3 2017-12-07 12:15:55

解决方案3
2 2017-12-07 11:57:33

解决方案4
0 2021-01-15 16:44:01

解决方案5
0 2022-10-03 01:22:44