简体   繁体   中英

How to rename one-hot encoded columns in pandas to their respective index?

I'm one-hot encoding some categorical variables with some code that was provied to me. This line adds a column of 0s and 1s with a name with the format prefix_categoricalValue

dataframe = pandas.concat([dataframe,pandas.get_dummies(dataframe[0], prefix='protocol')],axis=1).drop([0],axis=1)

I want the column to have as a name its index, not prefix_categoricalValue .

I know that I can do something like df.rename(columns={'prefix_categoricalValue': '0'}, inplace=True) , but I'm not sure how to do it for all the columns which have that prefix.

在此处输入图片说明

This is an example of a part of the dataframe. Whether I decide to leave the local_address prefix or not, each category will have its name. Is it possible to rename the column with its index?

EDIT:

I'm trying to do this:

for column in dataframe:
    dataframe.rename(columns={column: 'new_name'}, inplace=True)
        print (column)

but I'm not exactly sure why it doesn't work

import pandas as pd

# 'dataframe' is the name of your data frame in the question, so that's what I use
# in my code below, although I suggest using 'data' or something for it instead, 
# as 'DataFrame' is a keyword and its easy to make confusion. But anyway...

features = ['list of column names you want one-hot encoded']
# for example, features = ['Cars', 'Model, 'Year', ... ]

for f in features: 
    df = dataframe[[f]]

    df2 = (pd.get_dummies(df, prefix='', prefix_sep='')
                   .max(level=0, axis=1)
                   .add_prefix(f+' - '))  
    # the new feature names will be "<old_feature_name> - <categorical_value>"
    # for example, "Cars" will get transformed to "Cars - Minivan", "Cars - Truck", etc


    # add the new one-hot encoded column to the dataframe
    dataframe = pd.concat([dataframe, df2], axis=1)

    # you can remove the original columns, if you don't need them anymore (optional)
    dataframe = dataframe.drop([f], axis=1) 

Let's say your prefix is local_address_0.0.0.0 . The following code renames the columns that start with the prefix you specify to the index that column has according to the order in which they appear in the dataframe:

prefix = 'local_address_0.0.0.0'
cols = list(dataframe)
for idx, val in enumerate(cols):
    if val.startswith(prefix):
        dataframe.rename(index=str, columns={val: idx}, inplace=True)

This will show a warning in the console:

python3.6/site-packages/pandas/core/frame.py:3027: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas- 
docs/stable/indexing.html#indexing-view-versus-copy
return super(DataFrame, self).rename(**kwargs)

But it is just a warning, the column names of the dataframe are updated. If you want to learn more about the warning, see How to deal with SettingWithCopyWarning in Pandas?

If someone knows how to do the same thing without a warning, please comment.

IIUC

dummydf=pd.get_dummies(df.A)
dummydf.columns=['A']*dummydf.shape[1]
dummydf
Out[1171]: 
   A  A
0  1  0
1  0  1
2  1  0
df
Out[1172]: 
   A  B  C
0  a  b  1
1  b  a  2
2  a  c  3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM