简体   繁体   English

如何缩放直方图熊猫图的 y 轴?

[英]How to scale y-axis for histogram pandas plot?

I have data for a whole year with an interval of fifteen minutes and want to create a histogram counting hours and not fifteen minutes.我有一整年的数据,间隔为 15 分钟,我想创建一个直方图,计算小时数而不是 15 分钟。

Toy example code玩具示例代码

I have following toy example code我有以下玩具示例代码

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv(r"D:/tmp/load.csv")
df.hist(bins=range(20,80,5))
sns.set()
plt.xlabel("Value")
plt.ylabel("count")
plt.show()

Which produces the following graph.这会产生下图。

直方图

The data within the DataFrame is of the form: DataFrame 中的数据格式如下:

>>> df[(df["Time"] > "2021-04-10 19:45:00") & (df["Time"] < "2021-04-10 21:00:00")]
                     Time      tag
9584  2021-04-10 20:00:00  50.3840
9585  2021-04-10 20:15:00  37.8332
9586  2021-04-10 20:30:00  36.6808
9587  2021-04-10 20:45:00  37.1840

Expected result预期结果

I need to change the y-axis values on the histogram so as to see the hours count and not the fifteen minutes count.我需要更改直方图上的 y 轴值,以便查看小时数而不是十五分钟数。 So for the first column I should see 10 (40/4) instead of 40. So the whole y-axis should be divided by 4.所以对于第一列,我应该看到 10 (40/4) 而不是 40。所以整个 y 轴应该除以 4。

Question问题

How can I perform the scaling of the y-axis in the histogram?如何在直方图中执行 y 轴的缩放? Should I work with plt.yticks function somehow?我应该以某种方式使用plt.yticks函数吗?

Here is my take on your interesting question.这是我对你有趣问题的看法。

I don't know of a way to rescale the y-axis after having plotted the dataframe, but you can rescale the dataframe itself.我不知道在绘制数据框后重新缩放 y 轴的方法,但您可以重新缩放数据框本身。

For instance, in the following toy dataframe, with an interval of measure of 15 minutes, 9 values are comprised between 35 and 40:例如,在以下玩具数据框中,测量间隔为 15 分钟,9 个值包含在 35 和 40 之间:

  • 4 values have been measured between 20:00:00 and 20:59:00在 20:00:00 和 20:59:00 之间测量了 4 个值
  • 1 between 21:00:00 and 21:59:00 1 在 21:00:00 和 21:59:00 之间
  • 3 between 22:00:00 and 22:59:00 3 22:00:00 至 22:59:00
  • 1 between 23:00:00 and 23:59:00 1 在 23:00:00 和 23:59:00 之间
import pandas as pd

df = pd.DataFrame(
    {
        "index": [
            "2021-04-10 20:00:00",
            "2021-04-10 20:15:00",
            "2021-04-10 20:30:00",
            "2021-04-10 20:45:00",
            "2021-04-10 21:00:00",
            "2021-04-10 21:15:00",
            "2021-04-10 21:30:00",
            "2021-04-10 21:45:00",
            "2021-04-10 22:00:00",
            "2021-04-11 22:15:00",
            "2021-04-11 22:30:00",
            "2021-04-11 22:45:00",
            "2021-04-11 23:00:00",
            "2021-04-11 23:15:00",
            "2021-04-11 23:30:00",
            "2021-04-11 23:45:00",
        ],
        "tag": [39, 36, 36, 37, 42, 28, 39, 54, 43, 38, 39, 36, 44, 27, 38, 28],
    },
)
df["index"] = pd.to_datetime(df["index"], format="%Y-%m-%d %H:%M:%S")

Here is the corresponding plot:这是相应的情节:

df.copy().set_index("index").plot(
    kind="hist", bins=range(20, 80, 5), yticks=range(0, 10), grid=True
)

在此处输入图像描述

Had the measurement been hourly based, 4 values would have been found in the 35-40 bin:如果测量是按小时计算的,那么在 35-40 箱中会发现 4 个值:

  • 1 (and not 4) between 20:00:00 and 20:59:00 20:00:00 到 20:59:00 之间有 1 个(而不是 4 个)
  • 1 between 21:00:00 and 21:59:00 1 在 21:00:00 和 21:59:00 之间
  • 1 (and not 3) between 22:00:00 and 22:59:00 22:00:00 到 22:59:00 之间有 1 个(而不是 3 个)
  • 1 between 23:00:00 and 23:59:00 1 在 23:00:00 和 23:59:00 之间

So, rescaling the dataframe hourly suppose to:因此,每小时重新调整数据帧假设:

  • assign new columns for bins, dates and hours为箱、日期和小时分配新列
  • sort values and drop rows with same bin, date and hour, keeping only the first duplicate row排序值并删除具有相同 bin、日期和小时的行,仅保留第一个重复行
  • cleanup and plot清理和绘图
_ = (
    df.assign(
        bin=pd.cut(df["tag"], bins=range(20, 60, 5)),
        date=df["index"].dt.date,
        hour=df["index"].dt.hour,
    )
    .sort_values(by=["bin", "date", "hour"])
    .drop_duplicates(subset=["bin", "date", "hour"], keep="first")
    .drop(columns=["bin", "date", "hour"])
    .set_index("index")
    .plot(kind="hist", bins=range(20, 80, 5), yticks=range(0, 5), grid=True)
)

Which outputs:哪个输出:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM