简体   繁体   English

如何在 PyPlot 中的分类箱线图上绘制拟合曲线? 为什么结果与 Google Sheets 中的同一个图不同?

[英]How to plot a fitted curve over a categorical boxplot in PyPlot? Why does the result differ from the same plot in Google Sheets?

I have the following csv data:我有以下 csv 数据:

Dataset Size,MAPE,MAE,STD MAPE,STD MAE
35000,0.0715392337,23.38300578,0.9078698348,2.80407539
26250,0.06893431034,22.34732326,0.9833948236,1.926517044
17500,0.0756695622,26.0900766,0.6055443674,8.842862631
8750,0.07176532526,23.02646184,0.8284005282,2.190506033
4200,0.08661127364,29.89234607,0.9395831421,7.587818412
2100,0.08072315267,27.20110884,0.03956974712,4.948606892
1050,0.07505202908,27.04025924,0.841966778,4.550482956
700,0.07703248113,26.17923045,0.4468447145,1.523638508
350,0.08695408769,32.35331585,0.7891190087,4.18648457
200,0.09770903032,30.96197823,0.04648972591,3.892800694
170,0.1202382169,41.87828814,0.7257680584,6.70453713
150,0.1960949784,77.20321559,0.5661066006,21.57418682

From the above data, I would like to generate the following plot using matplotlib or similar (seaborn, pandas, etc.):根据上述数据,我想使用 matplotlib 或类似(seaborn、pandas 等)生成以下图:

在 Google 表格中生成的示例图

from pathlib import Path
from matplotlib import animation
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit

nr_datapoints = 10
def exponenial_func(x, a, b, c):
    return a*np.exp(-b*x)+c
def myplot(data_file):
    df = pd.read_csv(data_file)
    print(df.head())

    fig, ax = plt.subplots()

    # Exponential line fit
    popt, pcov = curve_fit(exponenial_func, np.array([float(i) for i in range(len(df['Dataset Size']))]), df['MAPE'], p0=(0, 0.0145, 0.0823))
    xp = np.linspace(0,len(df['Dataset Size']), 100)  
    plt.plot(xp, exponenial_func(xp, *popt), color = 'g')
    # barplote with error bars
    ax.bar([str(s) for s in df['Dataset Size']], df['MAPE'], yerr=df['STD MAPE'])
    plt.title('Accuracy of Model vs. Dataset Size')
    plt.xlabel('Dataset Size')
    plt.ylabel('Mean Absolute Percentage Error')
    fig.tight_layout()
    plt.show()

The plot that I get looks as follows:我得到的情节如下: 上面代码生成的图

Why do I end up with a line rather than a curve from my code despite fitting an exponential function to the data?尽管对数据拟合了指数函数,为什么我的代码最终得到一条线而不是一条曲线? (Given that the google sheets plot does the same thing, eg fitting an exponential curve to the data) (鉴于谷歌表格图做同样的事情,例如拟合数据的指数曲线)

Played around with some functions, and I think I can say with some degree of certainty that the Google Sheets exponential function has a form close to this:尝试了一些函数,我想我可以肯定地说,Google Sheets 指数函数的形式与此接近:

def sheetey_exponential_function(x, a, b, c):
    return a * b ** (x + c)

在此处输入图片说明

The problem is that the horizontal axis is not linear.问题是水平轴不是线性的。 Actually it is inversed linear.实际上它是逆线性的。 So if you want your fit to look like an exponential function, you need to replace x with 1/x :所以,如果你想你适合看起来像一个指数函数,你需要更换x1/x

def exponenial_func(x, a, b, c):
    return a*np.exp(-b/x)+c

The result is the following:结果如下: 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM