[英]KeyError when trying to plot or histogram pandas data in matplotlib
我從導入的csv文件生成基本分布直方圖時遇到問題。 該代碼適用於來自另一個csv的一組數據,但不適用於我感興趣的一組,這基本上是相同的。 這是我嘗試過的代碼:
import pandas as pd
import numpy as np
import matplotlib as plt
data = pd.read_csv("idcases.csv")
data1 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Marin")]
data2 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Sonoma")]
fig = plt.pyplot.figure()
ax = fig.add_subplot(111)
ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
plt.pyplot.xlabel('Population')
plt.pyplot.ylabel('Count of Population')
plt.pyplot.show()
產生:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-35-63303aa9d8a5> in <module>()
1 fig = plt.pyplot.figure()
2 ax = fig.add_subplot(111)
----> 3 ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
4 plt.pyplot.xlabel('Count')
5 plt.pyplot.ylabel('Count of Population')
C:\Program Files (x86)\Anaconda\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5602 # Massage 'x' for processing.
5603 # NOTE: Be sure any changes here is also done below to 'weights'
-> 5604 if isinstance(x, np.ndarray) or not iterable(x[0]):
5605 # TODO: support masked arrays;
5606 x = np.asarray(x)
C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
549 def __getitem__(self, key):
550 try:
--> 551 result = self.index.get_value(self, key)
552
553 if not np.isscalar(result):
C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1721
1722 try:
-> 1723 return self._engine.get_value(s, k)
1724 except KeyError as e1:
1725 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()
KeyError: 0L
我究竟做錯了什么? 這是我正在讀取的數據的一部分。 該代碼不適用於任何字段,包括“計數”或“費率”
Disease County Year Sex Count Population Rate CI.lower \
882 Amebiasis Marin 2001 Total 14 247731 5.651 3.090
883 Amebiasis Marin 2001 Female 0 125414 0.000 0.000
884 Amebiasis Marin 2001 Male 0 122317 0.000 0.000
885 Amebiasis Marin 2002 Total 7 247382 2.830 1.138
886 Amebiasis Marin 2002 Female 0 125308 0.000 0.000
887 Amebiasis Marin 2002 Male 0 122074 0.000 0.000
888 Amebiasis Marin 2003 Total 9 247280 3.640 1.664
889 Amebiasis Marin 2003 Female 0 125259 0.000 0.000
890 Amebiasis Marin 2003 Male 0 122021 0.000 0.000
從matploblib-v1.4.3
升級到matplotlib-v1.5.0
,我注意到對pandas.Series
繪制pandas.Series
停止工作,例如:
ax.plot_date(df['date'], df['raw'], '.-', label='raw')
會導致KeyError: 0
異常。
您需要將numpy.ndarray
而不是pandas.Series
傳遞給plot_date
函數:
ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')
讓我們看一下異常的完整回溯:
# ... PREVIOUS TRACEBACK MESSAGES OMITTED FOR BREVITY ...
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\matplotlib\dates.py in default_units(x, axis)
1562
1563 try:
-> 1564 x = x[0]
1565 except (TypeError, IndexError):
1566 pass
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
555 def __getitem__(self, key):
556 try:
--> 557 result = self.index.get_value(self, key)
558
559 if not np.isscalar(result):
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1788
1789 try:
-> 1790 return self._engine.get_value(s, k)
1791 except KeyError as e1:
1792 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()
KeyError: 0
請注意,該錯誤源於matploblib嘗試執行x=x[0]
。 如果您的pandas系列未使用從零開始的整數索引,則將失敗,因為這將查找索引值為0
,而不是pandas.Series
的0th
元素。
為了解決這個問題,我們需要從numpy.ndarray
中的數據中獲取一個pandas.Series
,然后將其用於繪圖:
ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')
給我的情節:
import io
import matplotlib.pyplot as plt
s = """ Disease County Year Sex Count Population Rate CI.lower
Amebiasis Marin 2001 Total 14 247731 5.651 3.090
Amebiasis Marin 2001 Female 0 125414 0.000 0.000
Amebiasis Marin 2001 Male 0 122317 0.000 0.000
Amebiasis Marin 2002 Total 7 247382 2.830 1.138
Amebiasis Marin 2002 Female 0 125308 0.000 0.000
Amebiasis Marin 2002 Male 0 122074 0.000 0.000
Amebiasis Marin 2003 Total 9 247280 3.640 1.664
Amebiasis Marin 2003 Female 0 125259 0.000 0.000
Amebiasis Marin 2003 Male 0 122021 0.000 0.000 """
fobj = io.StringIO(s)
data1 = pd.read_csv(fobj, delim_whitespace=True)
plt.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
plt.xlabel('Population')
plt.ylabel('Count of Population')
plt.show()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.