简体   繁体   中英

Python plot data with timestamp (with hours) using plotnine

I have the following dataframe given:

    point               timestamp_local         0
0   A                   2019-07-20 00:00:00     1
1   A                   2019-07-20 01:00:00     3
2   B                   2019-07-20 02:00:00     158
3   A                   2019-07-20 02:30:00     324
4   B                   2019-07-20 03:00:00     502

The dataframe tells me on which point at which time timestamp_local how many connections I had. The 0 is the count of the connections I had.

I want to plot this data now using the plotnine library. I have done this already and its working when I use timestamps without times, eg 2019-07-20 . But when I use timestamps with times, eg 2019-07-20 00:00:00 its not working.

This is my python command to plot the data without times:

pn.ggplot(df, pn.aes(x="timestamp_local", y="0", group="point", color="point")) + pn.geom_line(stat="identity")

This returns a figure where I can see the counts per day grouped by the point. 在此处输入图像描述

I have now two questions:

  1. How can I plot the same result when using timestamps with times like 2019-07-20 01:00:00 (the data go over several days. So I cannot just cut of the date!)
  2. How can I plot the same result grouped by month and year? (Eg 2019-07 , 2019-08 , 2019-09 and so on...)

I would highly prefer a solution with the plotnine library because there are more functinos I want to use later on eg smooth and so on. If its not possible with the plotnine library I would like to have a figure where I have one line for each point in a different color and the same figure, Like in the figure above, red is point A. blue is point B.

Kind regards

Data provided was stored in conn.csv, theme customization is included. First case displays full timestamp as requested using date_format function from mizani ( https://mizani.readthedocs.io/en/stable/formatters.html#mizani.formatters.date_format ).

from plotnine import *
import pandas as pd
from mizani.formatters import date_format

df = pd.read_csv('conn.csv', parse_dates=[1])
custom_axis = theme(axis_text_x = element_text(color="grey", size=6, angle=90, hjust=.3),
                    axis_text_y = element_text(color="grey", size=6), 
                    plot_title = element_text(size = 25, face = "bold"), 
                    axis_title = element_text(size = 10)  
                    ) 

(
    ggplot(data = df, mapping = aes(x='timestamp_local', y='0', group="point", color="point")) + 
    geom_line(stat="identity") + custom_axis + ylab("Count") + xlab("TimeStamp") + labs(title="Count of the Connections") +
    scale_x_datetime(labels = date_format("%Y-%m-%d %H:%M:%S"))
)

完整的时间戳图

to_period function is used to extract and add month_year column used to perform aggregation. geom_point used due to lack of information.

年月聚合

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM