带有时间序列数据的Panda中的数据框

Question

I just started learning pandas. 我刚开始学习熊猫。 I came across this; 我碰到了这个；

d = date_range('1/1/2011', periods=72, freq='H')
s = Series(randn(len(rng)), index=rng)

I have understood what is the above data means and I tried with IPython: 我已经了解了上述数据的含义，并尝试使用IPython：

import numpy as np
from numpy.random import randn
import time
r = date_range('1/1/2011', periods=72, freq='H')
r
len(r)
[r[i] for i in range(len(r))]
s = Series(randn(len(r)), index=r)
s
s.plot()
df_new = DataFrame(data = s, columns=['Random Number Generated'])

Is it correct way of creating a data frame? 这是创建数据框的正确方法吗？

The Next step given is to : Return a series where the absolute difference between a number and the next number in the series is less than 0.5 给出的下一步是：返回一个数字，该数字与该序列中下一个数字之间的绝对差小于0.5

Do I need to find the difference between each random number generated and store only the sets where the abs diff is < 0.5 ? 我是否需要查找每个生成的随机数之间的差异，并仅存储abs diff <0.5的集合？ Can someone explain how can I do that in pandas? 有人可以解释我该怎么做吗？

Also I tried to plot the series as histogram with; 我也试图将序列绘制为直方图。

 df_new.diff().hist()

The graph display the x as Random number with Y axis 0 to 18 (which I don't understand). 该图显示x为随机数，Y轴为0到18（我不理解）。 Can some one explain this to me as well? 有人也可以向我解释吗？

Answer 1

To give you some pointers in addition to @Dthal's comments: 除了@Dthal的注释之外，还为您提供了一些指针：

r = pd.date_range('1/1/2011', periods=72, freq='H')

As commented by @Dthal, you can simplify the creation of your DataFrame randomly sampled from the normal distribution like so: 正如@Dthal所评论的那样，您可以简化从正态分布中随机采样的DataFrame的创建，如下所示：

df = pd.DataFrame(index=r, data=randn(len(r)), columns=['Random Number Generated'])

To show only values that differ by less than 0.5 from the preceding value: 以仅显示values ，通过小于相差0.5从先前值：

diff = df.diff()
diff[abs(diff['Random Number Generated']) < 0.5]

                     Random Number Generated
2011-01-01 02:00:00                 0.061821
2011-01-01 05:00:00                 0.463712
2011-01-01 09:00:00                -0.402802
2011-01-01 11:00:00                -0.000434
2011-01-01 22:00:00                 0.295019
2011-01-02 03:00:00                 0.215095
2011-01-02 05:00:00                 0.424368
2011-01-02 08:00:00                -0.452416
2011-01-02 09:00:00                -0.474999
2011-01-02 11:00:00                 0.385204
2011-01-02 12:00:00                -0.248396
2011-01-02 14:00:00                 0.081890
2011-01-02 17:00:00                 0.421897
2011-01-02 18:00:00                 0.104898
2011-01-03 05:00:00                -0.071969
2011-01-03 15:00:00                 0.101156
2011-01-03 18:00:00                -0.175296
2011-01-03 20:00:00                -0.371812

Can simplify using .dropna() to get rid of the missing values. 可以简化使用.dropna()摆脱缺失值的过程。

The pandas.Series.hist() docs inform that the default number of bins is 10 , so that's number of bars you should expect and so it turns out in this case roughly symmetric around zero ranging roughly [-4, +4] . pandas.Series.hist() 文档通知您， bins的默认数量为10 ，因此，这是您应期望的bars数，因此在这种情况下，结果证明它大致对称于零，大致为[-4, +4] 。

Series.hist(by=None, ax=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, figsize=None, bins=10, **kwds) diff.hist() Series.hist（by = None，ax = None，grid = True，xlabelsize = None，xrot = None，ylabelsize = None，yrot = None，figsize = None，bins = 10，** kwds）diff.hist（）

带有时间序列数据的Panda中的数据框

问题描述

1 个解决方案

解决方案1
1 2015-12-31 06:32:10

带有时间序列数据的Panda中的数据框

问题描述

1 个解决方案

解决方案1 1 2015-12-31 06:32:10

解决方案1
1 2015-12-31 06:32:10