Pandas plot density plot from frequency table

Question

Let's say I have a DataFrame that looks (simplified) like this

where the index column represents a value, and the freq column represents the frequency of occurance of that value, as in a frequency table.

I'd like to plot a density plot for this table like one obtained from plot kind kde . However, this kind is apparently only meant for pd.Series . My df is too large to flatten out to a 1D Series, ie df = [2, 2, 3, 3, 3, ..,, 1, 1] . How can I plot such a density plot under these circumstances?

Answer 1

I know you have asked for the case where df is too large to flatten out, but the following answer works where this isn't the case:

pd.Series(df.index.repeat(df.freq)).plot.kde()

Or more generally, when the values are in a column called val and not the index:

df.val.repeat(df.freq).plot.kde()

Answer 2

You can plot a density distribution using a bar plot if you normalize the y values by the product of the size of the population. This will make the area covered by the bars equal to 1.

plt.bar(
    df.index,
    df.freq / df.freq.sum(),
    width=-1,
    align='edge'
)

The width and align parameters are to make sure each bar covers the interval (k-1, k].

Somebody with better knowledge of statistics should answer whether kernel density estimation actually makes sense for discrete distributions.

Answer 3

Maybe this will work:

import matplotlib.pyplot as plt

plt.plot(df.index, df['freq'])

plt.show()

Answer 4

Seaborn was built to do this on top of Matplotlib and automatically calculates kernel density estimates if you want.

import seaborn as sns

x = pd.Series(np.random.randint(0, 20, size = 10000), name = 'freq')

sns.distplot(x, kde = True)

Pandas plot density plot from frequency table

Question

4 answers

solution1
1 2019-08-29 13:27:42

solution2
1 2019-08-29 13:49:02

solution3
0 2015-12-09 21:07:45

solution4
0 2019-08-29 15:59:39

Pandas plot density plot from frequency table

Question

4 answers

solution1 1 2019-08-29 13:27:42

solution2 1 2019-08-29 13:49:02

solution3 0 2015-12-09 21:07:45

solution4 0 2019-08-29 15:59:39

solution1
1 2019-08-29 13:27:42

solution2
1 2019-08-29 13:49:02

solution3
0 2015-12-09 21:07:45

solution4
0 2019-08-29 15:59:39