简体   繁体   中英

How to plot a fitted curve over a categorical boxplot in PyPlot? Why does the result differ from the same plot in Google Sheets?

I have the following csv data:

Dataset Size,MAPE,MAE,STD MAPE,STD MAE
35000,0.0715392337,23.38300578,0.9078698348,2.80407539
26250,0.06893431034,22.34732326,0.9833948236,1.926517044
17500,0.0756695622,26.0900766,0.6055443674,8.842862631
8750,0.07176532526,23.02646184,0.8284005282,2.190506033
4200,0.08661127364,29.89234607,0.9395831421,7.587818412
2100,0.08072315267,27.20110884,0.03956974712,4.948606892
1050,0.07505202908,27.04025924,0.841966778,4.550482956
700,0.07703248113,26.17923045,0.4468447145,1.523638508
350,0.08695408769,32.35331585,0.7891190087,4.18648457
200,0.09770903032,30.96197823,0.04648972591,3.892800694
170,0.1202382169,41.87828814,0.7257680584,6.70453713
150,0.1960949784,77.20321559,0.5661066006,21.57418682

From the above data, I would like to generate the following plot using matplotlib or similar (seaborn, pandas, etc.):

在 Google 表格中生成的示例图

from pathlib import Path
from matplotlib import animation
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit

nr_datapoints = 10
def exponenial_func(x, a, b, c):
    return a*np.exp(-b*x)+c
def myplot(data_file):
    df = pd.read_csv(data_file)
    print(df.head())

    fig, ax = plt.subplots()

    # Exponential line fit
    popt, pcov = curve_fit(exponenial_func, np.array([float(i) for i in range(len(df['Dataset Size']))]), df['MAPE'], p0=(0, 0.0145, 0.0823))
    xp = np.linspace(0,len(df['Dataset Size']), 100)  
    plt.plot(xp, exponenial_func(xp, *popt), color = 'g')
    # barplote with error bars
    ax.bar([str(s) for s in df['Dataset Size']], df['MAPE'], yerr=df['STD MAPE'])
    plt.title('Accuracy of Model vs. Dataset Size')
    plt.xlabel('Dataset Size')
    plt.ylabel('Mean Absolute Percentage Error')
    fig.tight_layout()
    plt.show()

The plot that I get looks as follows: 上面代码生成的图

Why do I end up with a line rather than a curve from my code despite fitting an exponential function to the data? (Given that the google sheets plot does the same thing, eg fitting an exponential curve to the data)

Played around with some functions, and I think I can say with some degree of certainty that the Google Sheets exponential function has a form close to this:

def sheetey_exponential_function(x, a, b, c):
    return a * b ** (x + c)

在此处输入图片说明

The problem is that the horizontal axis is not linear. Actually it is inversed linear. So if you want your fit to look like an exponential function, you need to replace x with 1/x :

def exponenial_func(x, a, b, c):
    return a*np.exp(-b/x)+c

The result is the following: 在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM