[英]TypeError: unhashable type: 'numpy.ndarray' when trying to create scatter plot from dataset
I am trying to create a scatter plot using a dataset on movies. 我正在尝试使用电影上的数据集创建散点图。 The goal is to look at the correlation between the different categories and the target variable, whether or not the movie won an award.
目的是查看不同类别与目标变量之间的相关性,无论电影是否获得奖项。 I have tried doing a type call on my variables, and neither of them appear to be of type numpy.ndarray as they are both pandas dataframes, yet I still get the following error when I try to create a scatter plot:
我尝试对变量进行类型调用,但它们都不属于numpy.ndarray类型,因为它们都是pandas数据帧,但是在尝试创建散点图时仍然出现以下错误:
TypeError: unhashable type: 'numpy.ndarray'
TypeError:无法散列的类型:'numpy.ndarray'
My code is as follows: 我的代码如下:
import pandas as pd
import matplotlib.pyplot as plt
file=pd.read_csv('academy_awards.csv',sep=',',error_bad_lines=False,encoding="ISO 8859-1")
print(file)
df=pd.DataFrame(file)
#df=df.dropna(axis=0,how='any')
target=df.Category
X=pd.DataFrame(df.Won)
y=target
#print(type(X))
#print(type(y))
plt.scatter(X,y)
The following are the first 5 lines of the dataset I am using: 以下是我正在使用的数据集的前5行:
Year,Category,Nominee,Additional Info,Won
2010 (83rd),Actor -- Leading Role,Javier Bardem,Biutiful
{'Uxbal'},NO
2010 (83rd),Actor -- Leading Role,Jeff Bridges,True Grit {'Rooster
Cogburn'},NO
2010 (83rd),Actor -- Leading Role,Jesse Eisenberg,The Social
Network {'Mark Zuckerberg'},NO
2010 (83rd),Actor -- Leading Role,Colin Firth,The King's Speech
{'King George VI'},YES
2010 (83rd),Actor -- Leading Role,James Franco,127 Hours {'Aron
Ralston'},NO
2010 (83rd),Actor -- Supporting Role,Christian Bale,The Fighter
{'Dicky Eklund'},YES
Any help or suggestions are greatly appreciated! 任何帮助或建议,我们将不胜感激!
Edit: The following is the full traceback-- 编辑:以下是完整的追溯-
-----------------------------------------------------------------------
TypeError Traceback (most recent call
last)
<ipython-input-211-efcb7c41bca1> in <module>
14 print(y.shape)
15
---> 16 plt.scatter(X,y)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/pyplot.py in scatter(x, y, s, c, marker, cmap,
norm, vmin, vmax, alpha, linewidths, verts, edgecolors, data, **kwargs)
2862 vmin=vmin, vmax=vmax, alpha=alpha,
linewidths=linewidths,
2863 verts=verts, edgecolors=edgecolors, **({"data": data}
if data
-> 2864 is not None else {}), **kwargs)
2865 sci(__ret)
2866 return __ret
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
1808 "the Matplotlib list!)" % (label_namer,
func.__name__),
1809 RuntimeWarning, stacklevel=2)
-> 1810 return func(ax, *args, **kwargs)
1811
1812 inner.__doc__ = _add_data_doc(inner.__doc__,
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker,
cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
4170 edgecolors = 'face'
4171
-> 4172 self._process_unit_info(xdata=x, ydata=y,
kwargs=kwargs)
4173 x = self.convert_xunits(x)
4174 y = self.convert_yunits(y)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/axes/_base.py in _process_unit_info(self, xdata,
ydata, kwargs)
2133 return kwargs
2134
-> 2135 kwargs = _process_single_axis(xdata, self.xaxis,
'xunits', kwargs)
2136 kwargs = _process_single_axis(ydata, self.yaxis,
'yunits', kwargs)
2137 return kwargs
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/axes/_base.py in _process_single_axis(data, axis,
unit_name, kwargs)
2116 # We only need to update if there is nothing
set yet.
2117 if not axis.have_units():
-> 2118 axis.update_units(data)
2119
2120 # Check for units in the kwargs, and if present
update axis
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/axis.py in update_units(self, data)
1471 neednew = self.converter != converter
1472 self.converter = converter
-> 1473 default = self.converter.default_units(data, self)
1474 if default is not None and self.units is None:
1475 self.set_units(default)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/category.py in default_units(data, axis)
101 # default_units->axis_info->convert
102 if axis.units is None:
--> 103 axis.set_units(UnitData(data))
104 else:
105 axis.units.update(data)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/category.py in __init__(self, data)
167 self._counter = itertools.count()
168 if data is not None:
--> 169 self.update(data)
170
171 def update(self, data):
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/matplotlib/category.py in update(self, data)
184 data = np.atleast_1d(np.array(data, dtype=object))
185
--> 186 for val in OrderedDict.fromkeys(data):
187 if not isinstance(val, (str, bytes)):
188 raise TypeError("{val!r} is not a
string".format(val=val))
TypeError: unhashable type: 'numpy.ndarray'
Arrays are unhashable because they're mutable. 数组是不可哈希的,因为它们是可变的。 You can hash it by converting it to an immutable tuple (by wrapping it with
tuple()
) but you usually shouldn't be trying to hash arrays anyways. 您可以通过将其转换为不可变的元组(通过使用
tuple()
对其进行包装tuple()
来对其进行哈希处理,但是通常您始终不应该尝试对数组进行哈希处理。 Your data is probably of the wrong shape. 您的数据可能格式错误。
First, you don't need to: df=pd.DataFrame(file)
. 首先,您不需要:
df=pd.DataFrame(file)
。 After opening the CSV file with pandas and saved in the file
variable, you already get the data as dataFrame. 用熊猫打开CSV文件并将其保存在
file
变量中之后,您已经将数据作为dataFrame获得。
Then, you can easily call the scatter
and choose the x-axis and y-axis with 然后,您可以轻松地调用
scatter
并选择x轴和y轴
df.plot(kind ="scatter", x= "Won", y = "Category")
You don't need to preprocess the data, because of it's already preprocessed after opened the file with pandas. 您不需要预处理数据,因为在使用pandas打开文件后,数据已经进行了预处理。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.