简体   繁体   English

尝试从数据集创建散点图时,TypeError:无法散列的类型:'numpy.ndarray'

[英]TypeError: unhashable type: 'numpy.ndarray' when trying to create scatter plot from dataset

I am trying to create a scatter plot using a dataset on movies. 我正在尝试使用电影上的数据集创建散点图。 The goal is to look at the correlation between the different categories and the target variable, whether or not the movie won an award. 目的是查看不同类别与目标变量之间的相关性,无论电影是否获得奖项。 I have tried doing a type call on my variables, and neither of them appear to be of type numpy.ndarray as they are both pandas dataframes, yet I still get the following error when I try to create a scatter plot: 我尝试对变量进行类型调用,但它们都不属于numpy.ndarray类型,因为它们都是pandas数据帧,但是在尝试创建散点图时仍然出现以下错误:

TypeError: unhashable type: 'numpy.ndarray' TypeError:无法散列的类型:'numpy.ndarray'

My code is as follows: 我的代码如下:

import pandas as pd
import matplotlib.pyplot as plt

file=pd.read_csv('academy_awards.csv',sep=',',error_bad_lines=False,encoding="ISO 8859-1")
print(file)
df=pd.DataFrame(file)

#df=df.dropna(axis=0,how='any')
target=df.Category
X=pd.DataFrame(df.Won)

y=target
#print(type(X))
#print(type(y))

plt.scatter(X,y)

The following are the first 5 lines of the dataset I am using: 以下是我正在使用的数据集的前5行:

Year,Category,Nominee,Additional Info,Won
2010 (83rd),Actor -- Leading Role,Javier Bardem,Biutiful 
{'Uxbal'},NO
2010 (83rd),Actor -- Leading Role,Jeff Bridges,True Grit {'Rooster 
Cogburn'},NO
2010 (83rd),Actor -- Leading Role,Jesse Eisenberg,The Social 
Network {'Mark Zuckerberg'},NO
2010 (83rd),Actor -- Leading Role,Colin Firth,The King's Speech 
{'King George VI'},YES
2010 (83rd),Actor -- Leading Role,James Franco,127 Hours {'Aron 
Ralston'},NO
2010 (83rd),Actor -- Supporting Role,Christian Bale,The Fighter 
{'Dicky Eklund'},YES

Any help or suggestions are greatly appreciated! 任何帮助或建议,我们将不胜感激!

Edit: The following is the full traceback-- 编辑:以下是完整的追溯-

-----------------------------------------------------------------------
TypeError                                 Traceback (most recent call 
last)
<ipython-input-211-efcb7c41bca1> in <module>
     14 print(y.shape)
     15 
---> 16 plt.scatter(X,y)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/pyplot.py in scatter(x, y, s, c, marker, cmap, 
norm, vmin, vmax, alpha, linewidths, verts, edgecolors, data, **kwargs)
   2862         vmin=vmin, vmax=vmax, alpha=alpha, 
linewidths=linewidths,
   2863         verts=verts, edgecolors=edgecolors, **({"data": data} 
if data
-> 2864         is not None else {}), **kwargs)
   2865     sci(__ret)
   2866     return __ret

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1808                         "the Matplotlib list!)" % (label_namer, 
func.__name__),
   1809                         RuntimeWarning, stacklevel=2)
-> 1810             return func(ax, *args, **kwargs)
   1811 
   1812         inner.__doc__ = _add_data_doc(inner.__doc__,

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, 
cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4170             edgecolors = 'face'
   4171 
-> 4172         self._process_unit_info(xdata=x, ydata=y, 
kwargs=kwargs)
   4173         x = self.convert_xunits(x)
   4174         y = self.convert_yunits(y)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_base.py in _process_unit_info(self, xdata, 
ydata, kwargs)
   2133             return kwargs
   2134 
-> 2135         kwargs = _process_single_axis(xdata, self.xaxis, 
'xunits', kwargs)
   2136         kwargs = _process_single_axis(ydata, self.yaxis, 
'yunits', kwargs)
   2137         return kwargs

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_base.py in _process_single_axis(data, axis, 
unit_name, kwargs)
   2116                 # We only need to update if there is nothing 
set yet.
   2117                 if not axis.have_units():
-> 2118                     axis.update_units(data)
   2119 
   2120             # Check for units in the kwargs, and if present 
update axis

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axis.py in update_units(self, data)
   1471         neednew = self.converter != converter
   1472         self.converter = converter
-> 1473         default = self.converter.default_units(data, self)
   1474         if default is not None and self.units is None:
   1475             self.set_units(default)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in default_units(data, axis)
    101         # default_units->axis_info->convert
    102         if axis.units is None:
--> 103             axis.set_units(UnitData(data))
    104         else:
    105             axis.units.update(data)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in __init__(self, data)
    167         self._counter = itertools.count()
    168         if data is not None:
--> 169             self.update(data)
    170 
    171     def update(self, data):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in update(self, data)
    184         data = np.atleast_1d(np.array(data, dtype=object))
    185 
--> 186         for val in OrderedDict.fromkeys(data):
    187             if not isinstance(val, (str, bytes)):
    188                 raise TypeError("{val!r} is not a 
string".format(val=val))

TypeError: unhashable type: 'numpy.ndarray'

Arrays are unhashable because they're mutable. 数组是不可哈希的,因为它们是可变的。 You can hash it by converting it to an immutable tuple (by wrapping it with tuple() ) but you usually shouldn't be trying to hash arrays anyways. 您可以通过将其转换为不可变的元组(通过使用tuple()对其进行包装tuple()来对其进行哈希处理,但是通常您始终不应该尝试对数组进行哈希处理。 Your data is probably of the wrong shape. 您的数据可能格式错误。

First, you don't need to: df=pd.DataFrame(file) . 首先,您不需要: df=pd.DataFrame(file) After opening the CSV file with pandas and saved in the file variable, you already get the data as dataFrame. 用熊猫打开CSV文件并将其保存在file变量中之后,您已经将数据作为dataFrame获得。

Then, you can easily call the scatter and choose the x-axis and y-axis with 然后,您可以轻松地调用scatter并选择x轴和y轴

df.plot(kind ="scatter", x= "Won", y = "Category")

You don't need to preprocess the data, because of it's already preprocessed after opened the file with pandas. 您不需要预处理数据,因为在使用pandas打开文件后,数据已经进行了预处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM