简体   繁体   English

使用Pandas数据框绘制Seaborn中的时间序列

[英]Plotting timeseries in seaborn using pandas dataframe

I am trying to explore some data that I have imported from a .csv. 我正在尝试浏览从.csv导入的一些数据。 Within the file, there are around 100 'Companies', each of which has a 'SG' and a 'RG' metric. 在该文件中,大约有100个“公司”,每个公司都有一个“ SG”和一个“ RG”指标。 The data has several years of data for these metrics across the columns. 数据在各列中具有这些指标的多年数据。

I am trying to build some seaborn time series charts to overlay 'SG' and 'RG' lines on the same chart for a given 'Company'. 我正在尝试构建一些不可思议的时间序列图,以在给定“公司”的同一张图上叠加“ SG”和“ RG”线。 This image should explain what I mean: 此图像应解释我的意思:

Summary of the lux dataframe 勒克斯数据框摘要

Could anyone give me some guidance on how to create such a plot? 谁能给我一些有关如何创建这样的情节的指导? For example, plotting the 'Barbour' Company (from image above) with two lines, one for 'SG' and one for 'RG'. 例如,用两条线绘制“ Barbour”公司(上图),一条用于“ SG”,另一条用于“ RG”。

(Note, all the data types are floats, but with some NaNs in there, and I have included the usual stuff up front, such as: (请注意,所有数据类型都是浮点数,但其中包含一些NaN,而我前面已经包含了通常的内容,例如:

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

It is typically better to carry your data in long-form when using seaborn (well, in general). 使用seaborn时,通常最好以长格式携带数据(一般来说)。 An exception to this is when you'd like to use heatmaps, for which you'll need to pivot into a clean 2-variable table or matrix. 例外情况是,当您想使用热图时,需要将其转换为干净的2变量表或矩阵。

Anyways, in long-form, each row has exactly one observation, and supporting features are just additional columns on these rows. 无论如何,长形形式的每一行都只有一个观察值,而支持功能只是这些行上的其他列。 In your particular case, each row has several observations (1 for each year, given a company and metric), so we'd like to convert this to long-form, in which the year is merely another feature of your observation. 在您的特定情况下,每一行都有多个观测值(给定公司和度量标准,每年有1个观测值),因此我们希望将其转换为长格式,其中年份只是观测值的另一个特征。 Fortunately, pnd.melt can help you with that, as it is intended for that exact purpose. 幸运的是, pnd.melt可以帮助您实现这一目的,因为它pnd.melt用于该特定目的。

Let's start with a generic pnd.DataFrame , based on yours: 让我们从您的通用pnd.DataFrame开始:

In [1]: import pandas as pnd
In [2]: import seaborn as sns

In [3]: df = DataFrame.from_dict({
     ...:      'company': ['A', 'B', 'A', 'B', 'C'],
     ...:      'metric': ['SG', 'SG', 'RG', 'RG', 'SG'], 
     ...:      '2005': [1, 2, 3, 4, 5],
     ...:      '2006': [4, 5, 6, 7, 8]})    
In [4]: df
Out[4]: 
   2005  2006 company metric
0     1     4       A     SG
1     2     5       B     SG
2     3     6       A     RG
3     4     7       B     RG
4     5     8       C     SG

Converting to long-form using pnd.melt : 使用pnd.melt转换为长格式:

In [5]: df_melt = pnd.melt(df, 
                           id_vars=['company', 'metric'], 
                           value_vars=['2005', '2006'], 
                           var_name='year', 
                           value_name='value')
In [6]: df_melt
Out[6]: 
  company metric  year  value
0       A     SG  2005      1
1       B     SG  2005      2
2       A     RG  2005      3
3       B     RG  2005      4
4       C     SG  2005      5
5       A     SG  2006      4
6       B     SG  2006      5
7       A     RG  2006      6
8       B     RG  2006      7
9       C     SG  2006      8

And finally with sns.factorplot you can make use of parameters like, x, hue, row, col to visualize the data by doing factor-breakdowns: 最后,通过sns.factorplot您可以使用x,hue,row,col之类的参数通过分解因子来可视化数据:

In [7]: sns.factorplot(data=df_melt, 
                       x='year', 
                       y='value', 
                       hue='metric', 
                       col='company')

Out[7]: <seaborn.axisgrid.FacetGrid at 0x7f6286fee890>

In [8]: from matplotlib import pyplot as plt
In [9]: plt.show()

图1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM