简体   繁体   English

频率表中的熊猫图密度图

[英]Pandas plot density plot from frequency table

Let's say I have a DataFrame that looks (simplified) like this 假设我有一个看起来像这样(简化)的DataFrame

>>> df
    freq 
2      2   
3     16  
1     25  

where the index column represents a value, and the freq column represents the frequency of occurance of that value, as in a frequency table. 其中的index列代表一个值,而freq列代表该值的出现频率,如频率表中所示。

I'd like to plot a density plot for this table like one obtained from plot kind kde . 我想为此表绘制一个密度图,就像从图类型kde获得的密度图一样。 However, this kind is apparently only meant for pd.Series . 但是,这种类型显然仅适用于pd.Series My df is too large to flatten out to a 1D Series, ie df = [2, 2, 3, 3, 3, ..,, 1, 1] . 我的df太大,无法展平为1D系列,即df = [2, 2, 3, 3, 3, ..,, 1, 1] How can I plot such a density plot under these circumstances? 在这种情况下如何绘制密度图?

I know you have asked for the case where df is too large to flatten out, but the following answer works where this isn't the case: 我知道您已经问过df太大而无法展平的情况,但是以下回答适用于这种情况:

pd.Series(df.index.repeat(df.freq)).plot.kde()

Or more generally, when the values are in a column called val and not the index: 或更一般而言,当值位于名为val而不是索引的列中时:

df.val.repeat(df.freq).plot.kde()

You can plot a density distribution using a bar plot if you normalize the y values by the product of the size of the population. 如果您通过总体大小的乘积对y值进行归一化,则可以使用条形图来绘制密度分布。 This will make the area covered by the bars equal to 1. 这将使条形图覆盖的面积等于1。

plt.bar(
    df.index,
    df.freq / df.freq.sum(),
    width=-1,
    align='edge'
)

The width and align parameters are to make sure each bar covers the interval (k-1, k]. widthalign参数应确保每个条形都覆盖间隔(k-1,k]。

Somebody with better knowledge of statistics should answer whether kernel density estimation actually makes sense for discrete distributions. 了解统计信息的人应该回答内核密度估计对于离散分布是否真正有意义。

Maybe this will work: 也许这可以工作:

import matplotlib.pyplot as plt

plt.plot(df.index, df['freq'])

plt.show()

Seaborn was built to do this on top of Matplotlib and automatically calculates kernel density estimates if you want. Seaborn是在Matplotlib之上执行此操作的,并且可以根据需要自动计算内核密度估计值。

import seaborn as sns

x = pd.Series(np.random.randint(0, 20, size = 10000), name = 'freq')

sns.distplot(x, kde = True)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM