[英]When I use pd.crosstab it keeps showing AssertionError
When I use pd.crosstab
to build confusion matrices, it keeps showing当我使用pd.crosstab
构建混淆矩阵时,它一直显示
AssertionError: arrays and names must have the same length
import pandas as pd
import numpy as np
from sklearn.metrics import confusion_matrix
import random
df = pd.read_csv('C:\\Users\\liukevin\\Desktop\\winequality-red.csv',sep=';', usecols=['fixed acidity','volatile acidity','citric acid','residual sugar','chlorides','free sulfur dioxide','total sulfur dioxide','density','pH','sulphates','alcohol','quality'])
Q=[]
for i in range(len(df)):
if df['quality'][i]<=5:
Q.append('Low')
else:
Q.append('High')
del df['quality']
test_number=sorted(random.sample(xrange(len(df)), int(len(df)*0.25)))
train_number=[]
temp=[]
for i in range(len(df)):
temp.append(i)
train_number=list(set(temp)-set(test_number))
distance_all=[]
for i in range(len(test_number)):
distance_sep=[]
for j in range(len(train_number)):
distance=pow(df['fixed acidity'][test_number[i]]-df['fixed acidity'][train_number[j]],2)+\
pow(df['volatile acidity'][test_number[i]]-df['volatile acidity'][train_number[j]],2)+\
pow(df['citric acid'][test_number[i]]-df['citric acid'][train_number[j]],2)+\
pow(df['residual sugar'][test_number[i]]-df['residual sugar'][train_number[j]],2)+\
pow(df['chlorides'][test_number[i]]-df['chlorides'][train_number[j]],2)+\
pow(df['free sulfur dioxide'][test_number[i]]-df['free sulfur dioxide'][train_number[j]],2)+\
pow(df['total sulfur dioxide'][test_number[i]]-df['total sulfur dioxide'][train_number[j]],2)+\
pow(df['density'][test_number[i]]-df['density'][train_number[j]],2)+\
pow(df['pH'][test_number[i]]-df['pH'][train_number[j]],2)+\
pow(df['sulphates'][test_number[i]]-df['sulphates'][train_number[j]],2)+\
pow(df['alcohol'][test_number[i]]-df['alcohol'][train_number[j]],2)
distance_sep.append(distance)
distance_all.append(distance_sep)
for round in range(5):
K=2*round+1
select_neighbor_all=[]
for i in range(len(test_number)):
select_neighbor_sep=np.argsort(distance_all[i])[:K]
select_neighbor_all.append(select_neighbor_sep)
prediction=[]
Q_test=[]
for i in range(len(test_number)):
Q_test.append(Q[test_number[i]])
#original data
Low_count=0
for j in range(K):
if Q[train_number[select_neighbor_all[i][j]]]=='Low':
Low_count+=1
if Low_count>(K/2):
prediction.append('Low')
else:
prediction.append('High')
print pd.crosstab(Q_test, prediction, rownames=['Actual'], colnames=['Predicted'], margins=True)
But aren't the length of Q_test
and prediction
the same?但是Q_test
和prediction
的长度不一样吗? I guess it might be the problem that "names" must have the same length
because I am not really sure what it means.我想这可能是"names" must have the same length
的问题,因为我不确定它的含义。 (In Q_test
and prediction
arrays, there are only binary elements 'Low'
and 'High'
.) select_neighbor_all
is what I did to select K nearest neighbors of ith
test data. (在Q_test
和prediction
数组中,只有二进制元素'Low'
和'High'
。) select_neighbor_all
是我为选择ith
测试数据的 K 个最近邻居所做的。
It appears that you may not be providing all the data that pd.crosstab needs to perform the necessary calculations:看来您可能没有提供 pd.crosstab 执行必要计算所需的所有数据:
Take a look at this example.看看这个例子。 Here we provide an index AND two column categories AND rownames and colnames:这里我们提供了一个索引 AND 两个列类别 AND 行名和列名:
>>> index = np.array(["foo", "foo", "foo", "foo", "bar", "bar",
... "bar", "bar", "foo", "foo", "foo"], dtype=object)
>>> col_category_1 = np.array(["one", "one", "one", "two", "one", "one",
... "one", "two", "two", "two", "one"], dtype=object)
>>> col_category_2 = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
... "shiny", "dull", "shiny", "shiny", "shiny"],
... dtype=object)
# Notice the index AND the columns provided as a list
>>> pd.crosstab(index, [col_category_1, col_category_2],
rownames=['a'], colnames=['b', 'c'])
...
col_category_1 one two
col_category_2 dull shiny dull shiny
index
bar 1 2 1 0
foo 2 2 1 2
For more details, see the pandas documentation for pd.crosstab
:有关更多详细信息,请参阅pd.crosstab
的pandas 文档:
index : array-like, Series, or list of arrays/Series Values to group by in the rows index : 在行中分组的类数组、系列或数组/系列值列表
columns : array-like, Series, or list of arrays/Series Values to group by in the columns columns :列中要分组的类数组、系列或数组/系列值列表
rownames : sequence, default None If passed, must match number of row arrays passed rownames : 序列,默认 None 如果通过,必须匹配传递的行数组数
colnames : sequence, default None If passed, must match number of column arrays passed colnames : 序列,默认 None 如果通过,必须匹配传递的列数组数
If you edit the following line, and include the correct inputs, it should solve your problem...如果您编辑以下行并包含正确的输入,它应该可以解决您的问题...
# You will need to provide an index and columns...
# Here, 'Q_test' is being interpreted as your index
# 'prediction' is being used as a column...
pd.crosstab(Q_test, prediction,
rownames=['Actual'],
colnames=['Predicted'],
margins=True)
I just spent some time on resolving this.我只是花了一些时间来解决这个问题。 In my case it was that pandas crosstab does not seem to work with lists.就我而言,熊猫交叉表似乎不适用于列表。
If you convert your lists to numpy arrays it should work fine.如果您将列表转换为 numpy 数组,它应该可以正常工作。
So in your case it would be:所以在你的情况下,它将是:
pd.crosstab(np.array(Q_test), np.array(prediction), rownames=['Actual'],
colnames=['Predicted'], margins=True)
An example:一个例子:
>>> import pandas as pd
>>> import numpy as np
>>> classifications = ['foo', 'bar', 'foo', 'bar']
>>> predictions = ['foo', 'foo', 'bar', 'bar']
>>> pd.crosstab(classifications, predictions, rownames=['Actual'], colnames=['Predicted'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bastian/miniconda3/envs/machine_learning/lib/python3.6/site-packages/pandas/core/reshape/pivot.py", line 563, in crosstab
rownames = _get_names(index, rownames, prefix="row")
File "/home/bastian/miniconda3/envs/machine_learning/lib/python3.6/site-packages/pandas/core/reshape/pivot.py", line 703, in _get_names
raise AssertionError("arrays and names must have the same length")
AssertionError: arrays and names must have the same length
>>> pd.crosstab(np.array(classifications), np.array(predictions), rownames=['Actual'], colnames=['Predicted'])
Predicted bar foo
Actual
bar 1 1
foo 1 1
This happens because some operations like multiplication have different effects on lists than on numpy arays, I think.发生这种情况是因为我认为,乘法之类的某些操作对列表的影响与对 numpy 数组的影响不同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.