简体   繁体   English

如何解决pandas.get_dummies异常:数据必须为一维

[英]How to solve pandas.get_dummies Exception: Data must be 1-dimensional

I am trying to read the wages dataset Wages.csv . 我正在尝试读取工资数据集Wages.csv Then tried to bin the columns. 然后尝试将列装箱。 But I am getting an exception that shows data must be 1- dimensional 但是我得到一个例外,表明数据必须是一维的

The codes has been reproduced below and dataset link given. 这些代码已在下面复制,并给出了数据集链接。

# import modules
  import pandas as pd
  import numpy as np
  import statsmodels.api as sm
  import matplotlib.pyplot as plt 
  %matplotlib inline

  # read data_set
  data = pd.read_csv("Wage.csv")
  data.head()


  data_x = data['age']
  data_y = data['wage']

  # Dividing data into train and validation datasets
  from sklearn.model_selection import train_test_split
  train_x, valid_x, train_y, valid_y = train_test_split(data_x, data_y, test_size=0.33, random_state = 1)

  # Dividing the data into 4 bins
    df_cut, bins = pd.cut(train_x, 4, retbins=True, right=True)
    df_cut.value_counts(sort=False)

    df_steps = pd.concat([train_x, df_cut, train_y], keys=['age','age_cuts','wage'], axis=1)

    # Create dummy variables for the age groups
    df_steps_dummies = pd.get_dummies(df_cut)
    df_steps_dummies.head()



   df_steps_dummies.columns = ['17.938-33.5','33.5-49','49-64.5','64.5-80'] 

   # Fitting Generalised linear models
    fit3 = sm.GLM(df_steps.wage, df_steps_dummies).fit()

    # Binning validation set into same 4 bins
    bin_mapping = np.digitize(valid_x, bins) 
    X_valid = pd.get_dummies(bin_mapping)

I am getting an exception Exception: Data must be 1-dimensional 我遇到异常异常:数据必须是一维的

If you look at the data it is of the form: [1] [2] … [3] 如果查看数据,则其格式为:[1] [2]…[3]

You need to get it to something like [1 2 … 3] 您需要将其设置为[1 2…3]

Flattening the data into a single list then dropping it back into an np.array works. 将数据展平为单个列表,然后将其放回np.array即可。

For example: 例如:

Code: 码:

def binMapping(x): 
  flat = [] 
  prestep = np.digitize(x, bins)
  for sublist in prestep:
    for ele in sublist:
    flat.append(ele)  
  return np.array(flat)

bin_mapping = binMapping(valid_x)
X_valid = pd.get_dummies(bin_mapping)

This works. 这可行。 I'm sure there's a better way to do it. 我敢肯定有更好的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM