简体   繁体   English

使用matplotlib绘制熊猫DataFrame

[英]Plotting pandas DataFrame with matplotlib

Here is a sample of the code I am using which works perfectly well.. 这是我正在使用的代码示例,效果很好。

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

# Data
df=pd.DataFrame({'x': np.arange(10), 'y1': np.random.randn(10), 'y2': np.random.randn(10)+
    range(1,11), 'y3': np.random.randn(10)+range(11,21) })
print(df) 
# multiple line plot
plt.plot( 'x', 'y1', data=df, marker='o', markerfacecolor='blue', markersize=12, color='skyblue', linewidth=4)
plt.plot( 'x', 'y2', data=df, marker='', color='olive', linewidth=2)
plt.plot( 'x', 'y3', data=df, marker='', color='olive', linewidth=2, linestyle='dashed', label="y3")
plt.legend()
plt.show()

The values in the column 'x' actually refers to 10 hours time period of the day, starting with 6 AM as 0 and 7 AM, and so on. “ x”列中的值实际上是指一天中的10个小时,从6 AM(0和7 AM)开始,依此类推。 Is there any way I could replace those values(x-axis) in my figure with the time periods, like replace the 0 with 6 AM? 有什么办法可以用时间周期替换图中的那些值(x轴),例如将0替换为6 AM?

It's always a good idea to store time or datetime information as Pandas datetime datatype. 将时间或日期时间信息存储为Pandas datetime数据类型总是一个好主意

In your example, if you only want to keep the time information: 在您的示例中,如果您只想保留时间信息:

df['time'] = (df.x + 6) * pd.Timedelta(1, unit='h')    

Output 产量

   x        y1        y2         y3     time
0  0 -0.523190  1.681115  11.194223 06:00:00
1  1 -1.050002  1.727412  13.360231 07:00:00
2  2  0.284060  4.909793  11.377206 08:00:00
3  3  0.960851  2.702884  14.054678 09:00:00
4  4 -0.392999  5.507870  15.594092 10:00:00
5  5 -0.999188  5.581492  15.942648 11:00:00
6  6 -0.555095  6.139786  17.808850 12:00:00
7  7 -0.074643  7.963490  18.486967 13:00:00
8  8  0.445099  7.301115  19.005115 14:00:00
9  9 -0.214138  9.194626  20.432349 15:00:00

If you have a starting date: 如果您有开始日期:

start_date='2018-07-29' # change this date appropriately
df['datetime'] = pd.to_datetime(start_date) + (df.x + 6) * pd.Timedelta(1, unit='h')

Output 产量

   x        y1        y2         y3     time            datetime
0  0 -0.523190  1.681115  11.194223 06:00:00 2018-07-29 06:00:00
1  1 -1.050002  1.727412  13.360231 07:00:00 2018-07-29 07:00:00
2  2  0.284060  4.909793  11.377206 08:00:00 2018-07-29 08:00:00
3  3  0.960851  2.702884  14.054678 09:00:00 2018-07-29 09:00:00
4  4 -0.392999  5.507870  15.594092 10:00:00 2018-07-29 10:00:00
5  5 -0.999188  5.581492  15.942648 11:00:00 2018-07-29 11:00:00
6  6 -0.555095  6.139786  17.808850 12:00:00 2018-07-29 12:00:00
7  7 -0.074643  7.963490  18.486967 13:00:00 2018-07-29 13:00:00
8  8  0.445099  7.301115  19.005115 14:00:00 2018-07-29 14:00:00
9  9 -0.214138  9.194626  20.432349 15:00:00 2018-07-29 15:00:00

Now the time / datetime column have a special datatype: 现在,“时间/日期时间”列具有特殊的数据类型:

print(df.dtypes)
Out[5]: 
x                     int32
y1                  float64
y2                  float64
y3                  float64
time        timedelta64[ns]
datetime     datetime64[ns]
dtype: object

Which have a lot of nice properties, including automatic string formatting which you will find very useful in later parts of your projects. 它具有很多不错的属性,包括自动字符串格式设置 ,在项目的后续部分中您会发现它非常有用。

Finally, to plot using matplotlib: 最后,使用matplotlib进行绘图:

# multiple line plot
plt.plot( df.datetime.dt.hour, df['y1'], marker='o', markerfacecolor='blue', markersize=12, color='skyblue', linewidth=4)
plt.plot( df.datetime.dt.hour, df['y2'], marker='', color='olive', linewidth=2)
plt.plot( df.datetime.dt.hour, df['y3'], marker='', color='olive', linewidth=2, linestyle='dashed', label="y3")
plt.legend()
plt.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM