Let's say I have a DataFrame that looks (simplified) like this
>>> df
freq
2 2
3 16
1 25
where the index column represents a value, and the freq
column represents the frequency of occurance of that value, as in a frequency table.
I'd like to plot a density plot for this table like one obtained from plot kind kde
. However, this kind is apparently only meant for pd.Series
. My df
is too large to flatten out to a 1D Series, ie df = [2, 2, 3, 3, 3, ..,, 1, 1]
. How can I plot such a density plot under these circumstances?
I know you have asked for the case where df
is too large to flatten out, but the following answer works where this isn't the case:
pd.Series(df.index.repeat(df.freq)).plot.kde()
Or more generally, when the values are in a column called val
and not the index:
df.val.repeat(df.freq).plot.kde()
You can plot a density distribution using a bar plot if you normalize the y values by the product of the size of the population. This will make the area covered by the bars equal to 1.
plt.bar(
df.index,
df.freq / df.freq.sum(),
width=-1,
align='edge'
)
The width
and align
parameters are to make sure each bar covers the interval (k-1, k].
Somebody with better knowledge of statistics should answer whether kernel density estimation actually makes sense for discrete distributions.
Maybe this will work:
import matplotlib.pyplot as plt
plt.plot(df.index, df['freq'])
plt.show()
Seaborn was built to do this on top of Matplotlib and automatically calculates kernel density estimates if you want.
import seaborn as sns
x = pd.Series(np.random.randint(0, 20, size = 10000), name = 'freq')
sns.distplot(x, kde = True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.