繁体   English   中英

Python的Pandas:例外:数据必须是1维的

[英]Pandas for Python: Exception: Data must be 1-dimensional

这是我从教程中得到的

# Data Preprocessing

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

# Encoding categorical data
# Encoding the Independent Variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
# Encoding the Dependent Variable
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

这是带有编码虚拟变量的X矩阵

1.000000000000000000e+00    0.000000000000000000e+00    0.000000000000000000e+00    4.400000000000000000e+01    7.200000000000000000e+04
0.000000000000000000e+00    0.000000000000000000e+00    1.000000000000000000e+00    2.700000000000000000e+01    4.800000000000000000e+04
0.000000000000000000e+00    1.000000000000000000e+00    0.000000000000000000e+00    3.000000000000000000e+01    5.400000000000000000e+04
0.000000000000000000e+00    0.000000000000000000e+00    1.000000000000000000e+00    3.800000000000000000e+01    6.100000000000000000e+04
0.000000000000000000e+00    1.000000000000000000e+00    0.000000000000000000e+00    4.000000000000000000e+01    6.377777777777778101e+04
1.000000000000000000e+00    0.000000000000000000e+00    0.000000000000000000e+00    3.500000000000000000e+01    5.800000000000000000e+04
0.000000000000000000e+00    0.000000000000000000e+00    1.000000000000000000e+00    3.877777777777777857e+01    5.200000000000000000e+04
1.000000000000000000e+00    0.000000000000000000e+00    0.000000000000000000e+00    4.800000000000000000e+01    7.900000000000000000e+04
0.000000000000000000e+00    1.000000000000000000e+00    0.000000000000000000e+00    5.000000000000000000e+01    8.300000000000000000e+04
1.000000000000000000e+00    0.000000000000000000e+00    0.000000000000000000e+00    3.700000000000000000e+01    6.700000000000000000e+04

问题是没有列标签 我试过了

something = pd.get_dummies(X)

但我得到以下例外

Exception: Data must be 1-dimensional

大多数sklearn方法都不关心列名,因为它们主要关注它们实现的ML算法背后的数学。 如果您可以提前确定标签编码,则可以在fit_transform()之后将列名添加回OneHotEncoder输出。

首先,从原始dataset获取预测变量的列名,不包括第一个(我们为LabelEncoder保留):

X_cols = dataset.columns[1:-1]
X_cols
# Index(['Age', 'Salary'], dtype='object')

现在获取编码标签的顺序。 在这种特殊情况下,看起来LabelEncoder()按字母顺序组织其整数映射:

labels = labelencoder_X.fit(X[:, 0]).classes_ 
labels
# ['France' 'Germany' 'Spain']

合并这些列名,然后在转换为DataFrame时将它们添加到X

# X gets re-used, so make sure to define encoded_cols after this line
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
encoded_cols = np.append(labels, X_cols)
# ...
X = onehotencoder.fit_transform(X).toarray()
encoded_df = pd.DataFrame(X, columns=encoded_cols)

encoded_df
   France  Germany  Spain        Age        Salary
0     1.0      0.0    0.0  44.000000  72000.000000
1     0.0      0.0    1.0  27.000000  48000.000000
2     0.0      1.0    0.0  30.000000  54000.000000
3     0.0      0.0    1.0  38.000000  61000.000000
4     0.0      1.0    0.0  40.000000  63777.777778
5     1.0      0.0    0.0  35.000000  58000.000000
6     0.0      0.0    1.0  38.777778  52000.000000
7     1.0      0.0    0.0  48.000000  79000.000000
8     0.0      1.0    0.0  50.000000  83000.000000
9     1.0      0.0    0.0  37.000000  67000.000000

注意:例如我正在使用此数据集的数据 ,它看起来与OP使用的数据非常相似或相同。 注意输出如何与OP的X矩阵相同。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM