[英]Pandas / pyplot histogram: can plot df but not subset
df is an enormous dataframe. df是一个巨大的数据框。 I only need the subset where Zcoord > 1. 我只需要Zcoord> 1的子集。
df = pandas.DataFrame(first)
df.columns = ['Xcoord', 'Ycoord', 'Zcoord', 'Angle']
df0 = df[df.Zcoord>1]
The very same code that will draw a histogram of df will not work for df0. 绘制df直方图的代码完全相同 ,不适用于df0。
plot1 = plt.figure(1)
plt.hist(df0.Zcoord, bins=100, normed=False)
plt.show()
Ipython spits out KeyError:0. Ipython吐出KeyError:0。
python 2.7.9 anaconda, ipython 2.2.0, OS 10.9.4 python 2.7.9 anaconda,ipython 2.2.0,OS 10.9.4
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-42-71643df3888f> in <module>()
1 plot1 = plt.figure(1)
----> 2 plt.hist(df0.Zcoord, bins=100, normed=False)
3
4 plt.show()
5 from matplotlib.backends.backend_pdf import PdfPages
/Users/Kit/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, **kwargs)
2888 histtype=histtype, align=align, orientation=orientation,
2889 rwidth=rwidth, log=log, color=color, label=label,
-> 2890 stacked=stacked, **kwargs)
2891 draw_if_interactive()
2892 finally:
/Users/Kit/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5560 # Massage 'x' for processing.
5561 # NOTE: Be sure any changes here is also done below to 'weights'
-> 5562 if isinstance(x, np.ndarray) or not iterable(x[0]):
5563 # TODO: support masked arrays;
5564 x = np.asarray(x)
/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
482 def __getitem__(self, key):
483 try:
--> 484 result = self.index.get_value(self, key)
485
486 if not np.isscalar(result):
/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
1194
1195 try:
-> 1196 return self._engine.get_value(s, k)
1197 except KeyError as e1:
1198 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2993)()
/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2808)()
/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3440)()
KeyError: 0
You are passing a pandas.Series
to matplotlib ( df0.Zcoord
). 您正在将pandas.Series
传递给matplotlib( df0.Zcoord
)。 However, at the moment, matplotlib is a bit indecisive about whether or not it likes being fed pandas datatypes (as opposed to numpy ndarray
's). 但是,目前,matplotlib对于是否喜欢使用pandas数据类型(而不是numpy ndarray
) ndarray
。
At some point in the bowels of the matplotlib source, the histogram function is probably trying to get the "first item I've been asked to deal with", and it probably does that with a call to input[0]
where input
is whatever it was asked to chew on. 在matplotlib源代码的某些地方,直方图函数可能正在尝试获取“我被要求处理的第一项”,并且它可能通过调用input[0]
,其中input
是任意值它被要求继续咀嚼。 If input
is a numpy.ndarray
then everything works great. 如果input
是一个numpy.ndarray
那么一切都很好。 However, if input
is a pandas.Series
or (even worse) a pandas.DataFrame
, the expression input[0]
will have a very different meaning. 但是,如果input
是pandas.Series
或(甚至更糟)是pandas.DataFrame
,则表达式input[0]
含义将非常不同。 In that case, depending on the structure of the data you fed to plt.hist
, there could well be a KeyError
when trying to index into your input. 在这种情况下,这取决于你送入数据的结构plt.hist
,也很可能一个KeyError
试图索引时到您的输入。
In your particular case, this is probably working fine on df
as a whole because df
likely has an integer index ( [0, 1, 2, ..., len(df)-1]
), which is the default row index in a DataFrame
. 你的具体情况,这可能是在工作正常df
作为一个整体,因为df
可能有一个整数索引( [0, 1, 2, ..., len(df)-1]
这是默认的行索引一个DataFrame
。 However, when you select within df
to make df0
, the result keeps winds up with an index that is a subset of the index of df
(maybe it winds up [3, 6, 9, 12, ...]
). 但是,当内选择df
使df0
,结果保持卷起与是的索引的子集的索引df
(也许它卷起[3, 6, 9, 12, ...]
So everything works fine on df
(where the index contains 0
), but blows chunks on df0
(where, ironically, given its name, 0
does not appear in the index). 因此,一切都在df
上工作正常(索引包含0
),但在df0
上吹大块(具有讽刺意味的是,给定名称,索引中没有出现0
)。
Quick fix...instead of 快速修复...而不是
plt.hist(df0.Zcoord, bins=100, normed=False)
run this 运行这个
plt.hist(df0.Zcoord.values, bins=100, normed=False)
and my guess is everything will be good. 我的猜测是一切都会很好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.