简体   繁体   English

以 y 轴为百分比绘制直方图(使用 FuncFormatter?)

[英]Plot an histogram with y-axis as percentage (using FuncFormatter?)

I have a list of data in which the numbers are between 1000 and 20 000.我有一个数据列表,其中的数字在 1000 到 20 000 之间。

data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]

When I plot a histogram using the hist() function, the y-axis represents the number of occurrences of the values within a bin.当我使用hist()函数绘制直方图时,y 轴表示 bin 中值的出现次数。 Instead of the number of occurrences, I would like to have the percentage of occurrences.而不是出现次数,我想要出现的百分比。

上面数据列表的直方图

Code for the above plot:上图的代码:

f, ax = plt.subplots(1, 1, figsize=(10,5))
ax.hist(data, bins = len(list(set(data))))

I've been looking at this post which describes an example using FuncFormatter but I can't figure out how to adapt it to my problem.我一直在看这篇文章,它描述了一个使用FuncFormatter的例子,但我不知道如何使它适应我的问题。 Some help and guidance would be welcome :)欢迎提供一些帮助和指导:)

EDIT: Main issue with the to_percent(y, position) function used by the FuncFormatter .编辑:与主要问题to_percent(y, position)被使用的功能FuncFormatter The y corresponds to one given value on the y-axis I guess.我猜 y 对应于 y 轴上的一个给定值。 I need to divide this value by the total number of elements which I apparently can' t pass to the function...我需要将此值除以我显然无法传递给函数的元素总数...

EDIT 2: Current solution I dislike because of the use of a global variable:编辑 2:由于使用全局变量,我不喜欢当前的解决方案:

def to_percent(y, position):
    # Ignore the passed in position. This has the effect of scaling the default
    # tick locations.
    global n

    s = str(round(100 * y / n, 3))
    print (y)

    # The percent symbol needs escaping in latex
    if matplotlib.rcParams['text.usetex'] is True:
        return s + r'$\%$'
    else:
        return s + '%'

def plotting_hist(folder, output):
    global n

    data = list()
    # Do stuff to create data from folder

    n = len(data)
    f, ax = plt.subplots(1, 1, figsize=(10,5))
    ax.hist(data, bins = len(list(set(data))), rwidth = 1)

    formatter = FuncFormatter(to_percent)
    plt.gca().yaxis.set_major_formatter(formatter)

    plt.savefig("{}.png".format(output), dpi=500)

EDIT 3: Method with density = True编辑 3: density = True

在此处输入图片说明

Actual desired output (method with global variable):实际所需的输出(具有全局变量的方法):

在此处输入图片说明

Other answers seem utterly complicated.其他答案似乎完全复杂。 A histogram which shows the proportion instead of the absolute amount can easily produced by weighting the data with 1/n , where n is the number of datapoints.通过使用1/n对数据进行加权,可以很容易地生成显示比例而不是绝对数量的直方图,其中n是数据点的数量。

Then a PercentFormatter can be used to show the proportion (eg 0.45 ) as percentage ( 45% ).然后可以使用PercentFormatter将比例(例如0.45 )显示为百分比( 45% )。

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]

plt.hist(data, weights=np.ones(len(data)) / len(data))

plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()

在此处输入图片说明

Here we see that three of the 7 values are in the first bin, ie 3/7=43%.在这里我们看到 7 个值中的三个在第一个 bin 中,即 3/7=43%。

You can calculate the percentages yourself, then plot them as a bar chart.您可以自己计算百分比,然后将它们绘制为条形图。 This requires you to use numpy.histogram (which matplotlib uses "under the hood" anyway).这要求您使用numpy.histogram (无论如何,matplotlib 使用“ numpy.histogram ”)。 You can then adjust the y tick labels:然后,您可以调整 y 刻度标签:

import matplotlib.pyplot as plt
import numpy as np

f, ax = plt.subplots(1, 1, figsize=(10,5))
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]

heights, bins = np.histogram(data, bins = len(list(set(data))))

percent = [i/sum(heights)*100 for i in heights]

ax.bar(bins[:-1], percent, width=2500, align="edge")
vals = ax.get_yticks()
ax.set_yticklabels(['%1.2f%%' %i for i in vals])

plt.show()

在此处输入图片说明

Simply set density to true, the weights will be implicitly normalized.只需将密度设置为 true,权重将被隐式归一化。

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]

plt.hist(data, density=True)

plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()

I think the simplest way is to use seaborn which is a layer on matplotlib.我认为最简单的方法是使用 seaborn,它是 matplotlib 上的一个层。 Note that you can still use plt.subplots() , figsize() , ax , and fig to customize your plot.请注意,您仍然可以使用plt.subplots()figsize()axfig来自定义您的绘图。

import seaborn as sns

And using the following code:并使用以下代码:

sns.displot(data, stat='probability'))

Also, sns.displot has so many parameters that allow for very complex and informative graphs very easily.此外, sns.displot有很多参数,可以很容易地绘制非常复杂和信息丰富的图形。 They can be found here: displot Documentation它们可以在这里找到: displot 文档

You can use functools.partial to avoid using global s in your example.您可以使用functools.partial来避免在示例中使用global

Just add n to function parameters:只需将n添加到函数参数:

def to_percent(y, position, n):
    s = str(round(100 * y / n, 3))

    if matplotlib.rcParams['text.usetex']:
        return s + r'$\%$'

    return s + '%'

and then create a partial function of two arguments that you can pass to FuncFormatter :然后创建一个包含两个参数的部分函数,​​您可以将其传递给FuncFormatter

percent_formatter = partial(to_percent,
                            n=len(data))
formatter = FuncFormatter(percent_formatter)

Full code:完整代码:

from functools import partial

import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]


def to_percent(y, position, n):
    s = str(round(100 * y / n, 3))

    if matplotlib.rcParams['text.usetex']:
        return s + r'$\%$'

    return s + '%'


def plotting_hist(data):    
    f, ax = plt.subplots(figsize=(10, 5))
    ax.hist(data, 
            bins=len(set(data)), 
            rwidth=1)

    percent_formatter = partial(to_percent,
                                n=len(data))
    formatter = FuncFormatter(percent_formatter)
    plt.gca().yaxis.set_major_formatter(formatter)

    plt.show()


plotting_hist(data)

gives:给出:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM