In my Seaborn graph with code
import seaborn as sns
df['hour'] = df['time']/3600
plt.figure(figsize=(30,10))
plt.gcf().subplots_adjust(left = 0.3)
g = sns.distplot(df['hour'], axlabel = 'No. Of Hours', label = 'Frequency')
the line graph that comes with the distribution plot looks really weird as it presents a spike for each bar and shows an increasing trend at the right tail of the distribution graph, where little data was present. Is this graph correct? If not, what is wrong about it and how can I correct it? Here is the graph:
This issue is reported at http://github.com/mwaskom/seaborn/issues/1590
It has to do with the KDE algorithm used by the statsmodels package. You can force seaborn to use scipy's algorithm instead by adding this line:
sns.distributions._has_statsmodels = False
Here is a short snippet that reproduces the issue:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'hour': [1] + 100 * [2] + [10]
})
plt.figure(figsize=(30,10))
plt.gcf().subplots_adjust(left = 0.3)
g = sns.distplot(df['hour'], axlabel = 'No. Of Hours', label = 'Frequency')
And here's the result if you force it to not use statsmodels:
sns.distributions._has_statsmodels = False
plt.figure(figsize=(30,10))
plt.gcf().subplots_adjust(left = 0.3)
g = sns.distplot(df['hour'], axlabel = 'No. Of Hours', label = 'Frequency')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.