I'm a newbie from pandas and I'm in a stage of fundamental.
I tried to encode some data and put the same columns into data_enc.
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
new_data = data[['HeatingQC']][:35].copy()
data_enc = pd.DataFrame(labelencoder.fit_transform(new_data),
columns = [new_data.columns + '_enc'],
index = new_data.index)
print(data_enc.columns[0])
print(new_data.columns[0])
But then output is unexpected. that is
('HeatingQC_enc',)
HeatingQC
My question is, where does the parenthesis come from and how can I remove them?
The problem is how you created the columns
of data_enc. You passed a list
which contains an Index
object. Because of this nesting, pandas decided to create a broken MultiIndex. (It's broken because it's a MultiIndex with only a single level, so it really shouldn't exist)
Example:
df = pd.DataFrame(columns=list('abc'))
# Placing the Index in a list incorrectly leads to a MultiIndex
pd.DataFrame(columns=[df.columns+'_suffix']).columns
#MultiIndex([('a_suffix',),
# ('b_suffix',),
# ('c_suffix',)],)
# Instead get rid of the list, just add the suffix:
pd.DataFrame(columns=df.columns+'_suffix').columns
#Index(['a_suffix', 'b_suffix', 'c_suffix'], dtype='object')
How about new_data = data['HeatingQC'][:35].copy()
instead of indexing the dataframe with a list? That way you should get a single series.
The parenthesis are there because your code returned a tuple. To get rid of them run:
print(data_enc.columns[0][0])
Instead of: print(data_enc.columns[0])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.