简体   繁体   English

ValueError:使用序列设置数组元素?

[英]ValueError: setting an array element with a sequence?

Why am i getting this error message?为什么我会收到此错误消息?

Here are the variables that are included in my code.以下是我的代码中包含的变量。 The columns they include are all dummy variables:它们包含的列都是虚拟变量:

country_cols = wine_dummies.loc[:, 'country_Chile':'country_US']
variety_cols = wine_dummies.loc[:, 'variety_Cabernet 
Sauvignon':'variety_Zinfandel']
pricecat_cols = wine_dummies.loc[:, 'price_category_low':]

Here is the code that is throwing the error (it is throwing the error at "X = wine[feature_cols_1]":这是引发错误的代码(它在“X = wine[feature_cols_1]”处引发错误:

feature_cols_1 = ['price', country_cols, variety_cols, 'year']
feature_cols_2 = [pricecat_cols, country_cols, variety_cols, 'year']

X = wine[feature_cols_1] <---ERROR
y = wine['points']

Here is the head of my dataframe:这是我的数据框的头部:

country designation points  price   province    variety      year   ... variety_Riesling    variety_Rosé    variety_Sangiovese  variety_Sauvignon Blanc variety_Syrah   variety_Tempranillo variety_White Blend variety_Zinfandel   price_category_low  price_category_med
Portugal    Avidagos    87  15.0    Douro   Portuguese Red  2011.0  ... 0  0    0   0   0   0   0   0   1 0    

^ each dummy variable (0s and 1s) after "..." corresponds to each column after "..." ^“...”之后的每个虚拟变量(0s和1s)对应于“...”之后的每一列

This is actually quite cumbersome, so it's only going to be useful if you have lots of columns between 'country_Chile':'country_US' .这实际上非常麻烦,因此只有在'country_Chile':'country_US'之间有很多列时它才会有用。 In the below example, I'm deliberately dropping the a column in middle_columns by taking the column indices.在下面的示例中,我通过采用列索引故意删除middle_columnsa列。

This is using pandas.Index.get_loc to find the indices of the start and end columns, which can then be used as a slice on the full list of dataframe columns.这是使用pandas.Index.get_loc来查找开始和结束列的索引,然后可以将其用作数据帧列的完整列表上的切片。 Then it unpacks that list using * into the final list of columns.然后它使用*将该列表解压缩到最终的列列表中。

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [3, 4, 5], 
                   'd': [4, 5, 6], 'wine': ['happy', 'drunk', 'sad'],
                   'year': [2002, 2003, 2019]})

middle_columns = df.columns[df.columns.get_loc('b'):df.columns.get_loc('d')+1]
all_cols = ['wine', *middle_columns, 'year']
X = df[all_cols]

The reason your current approach doesn't work is that feature_cols_1 = ['price', country_cols, variety_cols, 'year'] returns a list of strings and dataframes, that you then try to use as columns to a second dataframe.您当前的方法不起作用的原因是feature_cols_1 = ['price', country_cols, variety_cols, 'year']返回字符串数据feature_cols_1 = ['price', country_cols, variety_cols, 'year']的列表,然后您尝试将其用作第二个数据框的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM