简体   繁体   English

如何在Python中获取用于Pareto分发的QQ图?

[英]How to get Q-Q plot for Pareto distribution in Python?

QQ plots are used to get the goodness of fit between a set of data points and theoretical distribution. QQ图用于获得一组数据点和理论分布之间的拟合优度。 Following is the procedure to get the points. 以下是获得积分的过程。

  1. Select the samples to use. 选择要使用的样品。 Sort the selected samples with X(i) denoting the ith sample 用表示第i个样本的X(i)对所选样本进行排序
  2. Find the model values that correspond to the samples. 查找与样本对应的模型值。 This is done in two steps, 这分两步完成,

    a. 一种。 Associate each sample with the percentile it represents. 将每个样本与其代表的百分位数相关联。 pi = (i-0.5)/n pi =(i-0.5)/ n

    b. Calculate the model value that would be associated with this percentile. 计算与该百分比相关的模型值。 This is done by inverting the model CDF, as is done when generating random variates from the model distribution. 这是通过反转模型CDF来完成的,就像从模型分布中生成随机变量一样。 Thus the model value corresponding to sample i is Finverse(pi). 因此,对应于样本i的模型值为Finverse(pi)。

    c. C。 Draw the QQ plot, using the n points 使用n点绘制QQ图

( X(i), Finverse(pi)) 1 ≤ i ≤ n (X(i),Finverse(pi))1≤i≤n

Using this approach I came up with the following python implementation. 使用这种方法,我想到了以下python实现。

_distn_names = ["pareto"]
def fit_to_all_distributions(data):
    dist_names = _distn_names

    params = {}
    for dist_name in dist_names:
        try:
            dist = getattr(st, dist_name)
            param = dist.fit(data)

            params[dist_name] = param
        except Exception:
            print("Error occurred in fitting")
            params[dist_name] = "Error"

    return params 

def get_q_q_plot(values, dist, params):
    values.sort()

    arg = params[:-2]
    loc = params[-2]
    scale = params[-1]

    x = []

    for i in range(len(values)):
        x.append((i-0.5)/len(values))

    y = getattr(st, dist).ppf(x, loc=loc, scale=scale, *arg)

    y = list(y)

    emp_percentiles = values
    dist_percentiles = y

    print("Emperical Percentiles")
    print(emp_percentiles)

    print("Distribution Percentiles")
    print(dist_percentiles)

    plt.figure()
    plt.xlabel('dist_percentiles')
    plt.ylabel('actual_percentiles')
    plt.title('Q Q plot')
    plt.plot(dist_percentiles, emp_percentiles)
    plt.savefig("/path/q-q-plot.png")

b = 2.62
latencies = st.pareto.rvs(b, size=500)
data = pd.Series(latencies)
params = fit_to_all_distributions(data)

pareto_params = params["pareto"]

get_q_q_plot(latencies, "pareto", pareto_params)

Ideally I should get a straight line, but this is what I get. 理想情况下,我应该得到一条直线,但这就是我得到的。

QQ剧情

Why don't I get a straight line? 为什么我没有直线? Is there anything wrong in my implementation? 我的实现中有什么问题吗?

You can get the QQ plot for any distribution (there are 82 in scipy stats) using the following code. 您可以使用以下代码获取任何分布的QQ图(scipy统计信息中有82个)。

import os
import matplotlib.pyplot as plt
import sys
import math
import numpy as np
import scipy.stats as st
from scipy.stats._continuous_distns import _distn_names
from scipy.optimize import curve_fit

def get_q_q_plot(latency_values, distribution):

    distribution = getattr(st, distribution)
    params = distribution.fit(latency_values)

    latency_values.sort()

    arg = params[:-2]
    loc = params[-2]
    scale = params[-1]

    x = []

    for i in range(1, len(latency_values)):
        x.append((i-0.5) / len(latency_values))

    y = distribution.ppf(x, loc=loc, scale=scale, *arg)

    y = list(y)

    emp_percentiles = latency_values[1:]
    dist_percentiles = y

    return emp_percentiles, dist_percentiles

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM