简体   繁体   中英

Seaborn distribution plot line graph shows ringing

In my Seaborn graph with code

import seaborn as sns
df['hour'] = df['time']/3600
plt.figure(figsize=(30,10))               
plt.gcf().subplots_adjust(left = 0.3)
g = sns.distplot(df['hour'], axlabel = 'No. Of Hours', label = 'Frequency')

the line graph that comes with the distribution plot looks really weird as it presents a spike for each bar and shows an increasing trend at the right tail of the distribution graph, where little data was present. Is this graph correct? If not, what is wrong about it and how can I correct it? Here is the graph: 在此处输入图片说明

This issue is reported at http://github.com/mwaskom/seaborn/issues/1590

It has to do with the KDE algorithm used by the statsmodels package. You can force seaborn to use scipy's algorithm instead by adding this line:

sns.distributions._has_statsmodels = False

Here is a short snippet that reproduces the issue:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
    'hour': [1] + 100 * [2] + [10]
})
plt.figure(figsize=(30,10))               
plt.gcf().subplots_adjust(left = 0.3)
g = sns.distplot(df['hour'], axlabel = 'No. Of Hours', label = 'Frequency')

在此处输入图片说明

And here's the result if you force it to not use statsmodels:

sns.distributions._has_statsmodels = False
plt.figure(figsize=(30,10))               
plt.gcf().subplots_adjust(left = 0.3)
g = sns.distplot(df['hour'], axlabel = 'No. Of Hours', label = 'Frequency')

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM