简体   繁体   English

带有时间序列数据的Panda中的数据框

[英]Data Frame in Panda with Time series data

I just started learning pandas. 我刚开始学习熊猫。 I came across this; 我碰到了这个;

d = date_range('1/1/2011', periods=72, freq='H')
s = Series(randn(len(rng)), index=rng)

I have understood what is the above data means and I tried with IPython: 我已经了解了上述数据的含义,并尝试使用IPython:

import numpy as np
from numpy.random import randn
import time
r = date_range('1/1/2011', periods=72, freq='H')
r
len(r)
[r[i] for i in range(len(r))]
s = Series(randn(len(r)), index=r)
s
s.plot()
df_new = DataFrame(data = s, columns=['Random Number Generated'])

Is it correct way of creating a data frame? 这是创建数据框的正确方法吗?

The Next step given is to : Return a series where the absolute difference between a number and the next number in the series is less than 0.5 给出的下一步是:返回一个数字,该数字与该序列中下一个数字之间的绝对差小于0.5

Do I need to find the difference between each random number generated and store only the sets where the abs diff is < 0.5 ? 我是否需要查找每个生成的随机数之间的差异,并仅存储abs diff <0.5的集合? Can someone explain how can I do that in pandas? 有人可以解释我该怎么做吗?

Also I tried to plot the series as histogram with; 我也试图将序列绘制为直方图。

 df_new.diff().hist()

The graph display the x as Random number with Y axis 0 to 18 (which I don't understand). 该图显示x为随机数,Y轴为0到18(我不理解)。 Can some one explain this to me as well? 有人也可以向我解释吗?

To give you some pointers in addition to @Dthal's comments: 除了@Dthal的注释之外,还为您提供了一些指针:

r = pd.date_range('1/1/2011', periods=72, freq='H')

As commented by @Dthal, you can simplify the creation of your DataFrame randomly sampled from the normal distribution like so: 正如@Dthal所评论的那样,您可以简化从正态分布中随机采样的DataFrame的创建,如下所示:

df = pd.DataFrame(index=r, data=randn(len(r)), columns=['Random Number Generated'])

To show only values that differ by less than 0.5 from the preceding value: 以仅显示values ,通过小于相差0.5从先前值:

diff = df.diff()
diff[abs(diff['Random Number Generated']) < 0.5]

                     Random Number Generated
2011-01-01 02:00:00                 0.061821
2011-01-01 05:00:00                 0.463712
2011-01-01 09:00:00                -0.402802
2011-01-01 11:00:00                -0.000434
2011-01-01 22:00:00                 0.295019
2011-01-02 03:00:00                 0.215095
2011-01-02 05:00:00                 0.424368
2011-01-02 08:00:00                -0.452416
2011-01-02 09:00:00                -0.474999
2011-01-02 11:00:00                 0.385204
2011-01-02 12:00:00                -0.248396
2011-01-02 14:00:00                 0.081890
2011-01-02 17:00:00                 0.421897
2011-01-02 18:00:00                 0.104898
2011-01-03 05:00:00                -0.071969
2011-01-03 15:00:00                 0.101156
2011-01-03 18:00:00                -0.175296
2011-01-03 20:00:00                -0.371812

Can simplify using .dropna() to get rid of the missing values. 可以简化使用.dropna()摆脱缺失值的过程。

The pandas.Series.hist() docs inform that the default number of bins is 10 , so that's number of bars you should expect and so it turns out in this case roughly symmetric around zero ranging roughly [-4, +4] . pandas.Series.hist() 文档通知您, bins的默认数量为10 ,因此,这是您应期望的bars数,因此在这种情况下,结果证明它大致对称于零,大致为[-4, +4]

Series.hist(by=None, ax=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, figsize=None, bins=10, **kwds) diff.hist() Series.hist(by = None,ax = None,grid = True,xlabelsize = None,xrot = None,ylabelsize = None,yrot = None,figsize = None,bins = 10,** kwds)diff.hist()

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM