简体   繁体   English

用seaborn绘制时间序列数据

[英]Plotting time-series data with seaborn

Say I create a fully random Dataframe using the following: 假设我使用以下内容创建一个完全随机的Dataframe

from pandas.util import testing
from random import randrange

def random_date(start, end):
    delta = end - start
    int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(int_delta)
    return start + timedelta(seconds=random_second)

def rand_dataframe():
  df = testing.makeDataFrame()
  df['date'] = [random_date(datetime.date(2014,3,18),datetime.date(2014,4,1)) for x in xrange(df.shape[0])]
  df.sort(columns=['date'], inplace=True)      
  return df

df = rand_dataframe()

which results in the dataframe shown at the bottom of this post. 这导致数据框显示在本文的底部。 I would like to plot my columns A , B , C and D using the timeseries visualization features in seaborn so that I get something along these lines: 我想使用seaborn时间序列可视化功能绘制我的列ABCD ,以便我得到以下这些内容:

在此输入图像描述

How can I approach this problem? 我该如何处理这个问题? From what I read on this notebook , the call should be: 根据我在这款笔记本上看到的内容,电话应该是:

sns.tsplot(df, time="time", unit="unit", condition="condition", value="value")

but this seems to require that the dataframe is represented in a different way, with the columns somehow encoding time , unit , condition and value , which is not my case. 但这似乎要求数据框以不同的方式表示,列以某种方式编码timeunitconditionvalue ,这不是我的情况。 How can I convert my dataframe (shown below) into this format? 如何将我的数据帧(如下所示)转换为此格式?

Here is my dataframe: 这是我的数据帧:

      date         A         B         C         D

2014-03-18  1.223777  0.356887  1.201624  1.968612
2014-03-18  0.160730  1.888415  0.306334  0.203939
2014-03-18 -0.203101 -0.161298  2.426540  0.056791
2014-03-18 -1.350102  0.990093  0.495406  0.036215
2014-03-18 -1.862960  2.673009 -0.545336 -0.925385
2014-03-19  0.238281  0.468102 -0.150869  0.955069
2014-03-20  1.575317  0.811892  0.198165  1.117805
2014-03-20  0.822698 -0.398840 -1.277511  0.811691
2014-03-20  2.143201 -0.827853 -0.989221  1.088297
2014-03-20  0.299331  1.144311 -0.387854  0.209612
2014-03-20  1.284111 -0.470287 -0.172949 -0.792020
2014-03-22  1.031994  1.059394  0.037627  0.101246
2014-03-22  0.889149  0.724618  0.459405  1.023127
2014-03-23 -1.136320 -0.396265 -1.833737  1.478656
2014-03-23 -0.740400 -0.644395 -1.221330  0.321805
2014-03-23 -0.443021 -0.172013  0.020392 -2.368532
2014-03-23  1.063545  0.039607  1.673722  1.707222
2014-03-24  0.865192 -0.036810 -1.162648  0.947431
2014-03-24 -1.671451  0.979238 -0.701093 -1.204192
2014-03-26 -1.903534 -1.550349  0.267547 -0.585541
2014-03-27  2.515671 -0.271228 -1.993744 -0.671797
2014-03-27  1.728133 -0.423410 -0.620908  1.430503
2014-03-28 -1.446037 -0.229452 -0.996486  0.120554
2014-03-28 -0.664443 -0.665207  0.512771  0.066071
2014-03-29 -1.093379 -0.936449 -0.930999  0.389743
2014-03-29  1.205712 -0.356070 -0.595944  0.702238
2014-03-29 -1.069506  0.358093  1.217409 -2.286798
2014-03-29  2.441311  1.391739 -0.838139  0.226026
2014-03-31  1.471447 -0.987615  0.201999  1.228070
2014-03-31 -0.050524  0.539846  0.133359 -0.833252

In the end, what I am looking for is an overlay of of plots (one per column), where each of them looks as follows (note that different values of CI get different values of alphas): 最后,我要找的是图的叠加(每列一个),其中每个看起来如下(注意CI的不同值得到不同的alpha值):

在此输入图像描述

I don't think tsplot is going to work with the data you have. 我不认为tsplot会使用你拥有的数据。 The assumptions it makes about the input data are that you've sampled the same units at each timepoint (although you can have missing timepoints for some units). 它对输入数据的假设是你在每个时间点对相同的单位进行了采样(尽管某些单位可能缺少时间点)。

For example, say you measured blood pressure from the same people every day for a month, and then you wanted to plot the average blood pressure by condition (where maybe the "condition" variable is the diet they are on). 例如,假设您每天测量同一个人的血压一个月,然后您想按条件绘制平均血压(可能“条件”变量是他们所饮食的饮食)。 tsplot could do this, with a call that would look something like sns.tsplot(df, time="day", unit="person", condition="diet", value="blood_pressure") tsplot可以执行此操作,调用看起来像sns.tsplot(df, time="day", unit="person", condition="diet", value="blood_pressure")

That scenario is different from having large groups of people on different diets and each day randomly sampling some from each group and measuring their blood pressure. 这种情况不同于在不同饮食中拥有大量人群,并且每天随机抽取每组中的一些人并测量他们的血压。 From the example you gave, it seems like your data are structured like the this. 从您给出的示例来看,您的数据似乎就像这样。

However, it's not that hard to come up with a mix of matplotlib and pandas that will do what I think you want: 然而,想出matplotlib和pandas的组合并不难,我会根据自己的想法做到:

# Read in the data from the stackoverflow question
df = pd.read_clipboard().iloc[1:]

# Convert it to "long-form" or "tidy" representation
df = pd.melt(df, id_vars=["date"], var_name="condition")

# Plot the average value by condition and date
ax = df.groupby(["condition", "date"]).mean().unstack("condition").plot()

# Get a reference to the x-points corresponding to the dates and the the colors
x = np.arange(len(df.date.unique()))
palette = sns.color_palette()

# Calculate the 25th and 75th percentiles of the data
# and plot a translucent band between them
for cond, cond_df in df.groupby("condition"):
    low = cond_df.groupby("date").value.apply(np.percentile, 25)
    high = cond_df.groupby("date").value.apply(np.percentile, 75)
    ax.fill_between(x, low, high, alpha=.2, color=palette.pop(0))

This code produces: 此代码生成:

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM