简体   繁体   English

熊猫/ Pyplot直方图:可以绘制DF,但不能绘制子集

[英]Pandas / pyplot histogram: can plot df but not subset

df is an enormous dataframe. df是一个巨大的数据框。 I only need the subset where Zcoord > 1. 我只需要Zcoord> 1的子集。

df = pandas.DataFrame(first)
df.columns = ['Xcoord', 'Ycoord', 'Zcoord', 'Angle']
df0 = df[df.Zcoord>1]

The very same code that will draw a histogram of df will not work for df0. 绘制df直方图的代码完全相同 ,不适用于df0。

plot1 = plt.figure(1)
plt.hist(df0.Zcoord, bins=100, normed=False)
plt.show()

Ipython spits out KeyError:0. Ipython吐出KeyError:0。

python 2.7.9 anaconda, ipython 2.2.0, OS 10.9.4 python 2.7.9 anaconda,ipython 2.2.0,OS 10.9.4

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-42-71643df3888f> in <module>()
      1 plot1 = plt.figure(1)
----> 2 plt.hist(df0.Zcoord, bins=100, normed=False)
      3 
      4 plt.show()
      5 from matplotlib.backends.backend_pdf import PdfPages

/Users/Kit/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, **kwargs)
   2888                       histtype=histtype, align=align, orientation=orientation,
   2889                       rwidth=rwidth, log=log, color=color, label=label,
-> 2890                       stacked=stacked, **kwargs)
   2891         draw_if_interactive()
   2892     finally:

/Users/Kit/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   5560         # Massage 'x' for processing.
   5561         # NOTE: Be sure any changes here is also done below to 'weights'
-> 5562         if isinstance(x, np.ndarray) or not iterable(x[0]):
   5563             # TODO: support masked arrays;
   5564             x = np.asarray(x)

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    482     def __getitem__(self, key):
    483         try:
--> 484             result = self.index.get_value(self, key)
    485 
    486             if not np.isscalar(result):

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1194 
   1195         try:
-> 1196             return self._engine.get_value(s, k)
   1197         except KeyError as e1:
   1198             if len(self) > 0 and self.inferred_type in ['integer','boolean']:

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2993)()

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2808)()

/Users/Kit/anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3440)()

KeyError: 0

You are passing a pandas.Series to matplotlib ( df0.Zcoord ). 您正在将pandas.Series传递给matplotlib( df0.Zcoord )。 However, at the moment, matplotlib is a bit indecisive about whether or not it likes being fed pandas datatypes (as opposed to numpy ndarray 's). 但是,目前,matplotlib对于是否喜欢使用pandas数据类型(而不是numpy ndarrayndarray

At some point in the bowels of the matplotlib source, the histogram function is probably trying to get the "first item I've been asked to deal with", and it probably does that with a call to input[0] where input is whatever it was asked to chew on. 在matplotlib源代码的某些地方,直方图函数可能正在尝试获取“我被要求处理的第一项”,并且它可能通过调用input[0] ,其中input是任意值它被要求继续咀嚼。 If input is a numpy.ndarray then everything works great. 如果input是一个numpy.ndarray那么一切都很好。 However, if input is a pandas.Series or (even worse) a pandas.DataFrame , the expression input[0] will have a very different meaning. 但是,如果inputpandas.Series或(甚至更糟)是pandas.DataFrame ,则表达式input[0]含义将非常不同。 In that case, depending on the structure of the data you fed to plt.hist , there could well be a KeyError when trying to index into your input. 在这种情况下,这取决于你送入数据的结构plt.hist ,也很可能一个KeyError试图索引时到您的输入。

In your particular case, this is probably working fine on df as a whole because df likely has an integer index ( [0, 1, 2, ..., len(df)-1] ), which is the default row index in a DataFrame . 你的具体情况,这可能是在工作正常df作为一个整体,因为df可能有一个整数索引( [0, 1, 2, ..., len(df)-1]这是默认的行索引一个DataFrame However, when you select within df to make df0 , the result keeps winds up with an index that is a subset of the index of df (maybe it winds up [3, 6, 9, 12, ...] ). 但是,当内选择df使df0 ,结果保持卷起与是的索引的子集的索引df (也许它卷起[3, 6, 9, 12, ...] So everything works fine on df (where the index contains 0 ), but blows chunks on df0 (where, ironically, given its name, 0 does not appear in the index). 因此,一切都在df上工作正常(索引包含0 ),但在df0上吹大块(具有讽刺意味的是,给定名称,索引中没有出现0 )。

Quick fix...instead of 快速修复...而不是

plt.hist(df0.Zcoord, bins=100, normed=False)

run this 运行这个

plt.hist(df0.Zcoord.values, bins=100, normed=False)

and my guess is everything will be good. 我的猜测是一切都会很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM