简体   繁体   中英

How can I visualize time data from a Pandas Dataframe?

Once in a while I have time data where I would like to just visualize how often events are occurring. So I basically have a list of datetimes and I want to show a plot with

  • x-axis is hour (0 - 24, hence 24 bins)
  • y-axis is the number of events

So basically it is a histogram, grouped by hour.

I already have one solution, but how do I make sure that all 24 bins exist? (and it could look nicer, too)

Minimal Example

#!/usr/bin/env python


"""Create and visualize date with timestamps."""

# core modules
from datetime import datetime
import random

# 3rd party module
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt


def create_data(num_samples, year, month_p=None, day_p=None):
    """
    Create timestamp data.

    Parameters
    ----------
    num_samples : int
    year: int
    month_p : int, optional (default: None)
    day_p : int, optional (default: None)

    Returns
    -------
    data : Pandas.Dataframe object
    """
    data = []
    for _ in range(num_samples):
        if month_p is None:
            month = random.randint(1, 12)
        else:
            month = month_p
        if day_p is None:
            day = random.randint(1, 28)
        else:
            day = day_p
        hour = int(np.random.normal(loc=7) * 3) % 24
        minute = random.randint(0, 59)
        data.append({'date': datetime(year, month, day, hour, minute)})
    data = sorted(data, key=lambda n: n['date'])
    return pd.DataFrame(data)


def visualize_data(df):
    """
    Plot data binned by hour.

    x-axis is the hour, y-axis is the number of datapoints.

    Parameters
    ----------
    df : Pandas.Dataframe object
    """
    df.groupby(df["date"].dt.hour).count().plot(kind="bar")
    plt.show()


df = create_data(2000, 2017)
visualize_data(df)

As you can see, the 7, 9 and 10 are missing.

在此处输入图片说明

reindex the resulting DataFrame with all the values and then call the plot method:

res = df.groupby(df["date"].dt.hour).count().reindex(np.arange(24), fill_value=0)
res.plot(kind="bar")
plt.show()

在此处输入图片说明

Try this function:

def visualize_data(df):
    """
    Plot data binned by hour.

    x-axis is the hour, y-axis is the number of datapoints.

    Parameters
    ----------
    df : Pandas.Dataframe object
    """
    y = df.groupby(df["date"].dt.hour).count()
    for i in range(24):
        y.loc[i] = 0 if i not in y.index else y.loc[i]  # Add missing locations.
    y.sort_index(inplace = True)   # Sort the locations.
    y.plot(kind="bar")
    plt.show()
matplotlib.style.use('ggplot')

see - https://pandas.pydata.org/pandas-docs/stable/visualization.html

As you can see, the 7, 9 and 10 are missing.

O events ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM