[英]How can I obtain a probability at specific x value in a gamma probability density function in Python scipy?
[英]How can Python use n, min, max, mean, std, 25%, 50%, 75%, Skew, Kurtosis to define a psudo-random Probability Density Estimate/Function?
在閱讀和試驗 numpy.random 時,我似乎無法找到或創建我需要的東西; 一個 10 參數 Python 偽隨機值生成器,包括計數、最小值、最大值、平均值、標准差、25th%ile、50th%ile(中位數)、75th%ile、偏斜和峰度。
從https://docs.python.org/3/library/random.html我看到這些分布是均勻的、正態的(高斯分布)、對數正態分布和負數分布僅由我的 10 個參數定義,不涉及分布族。
是否有 numpy.random.xxxxxx(n, min, max, mean, sd, 25%, 50%, 75%, skew, kurtosis) 的文檔或作者,或者最接近的現有我可能會修改源代碼以實現此目標?
這將是 describe() 的反面,在某種程度上包括偏斜和峰度。 我可以做一個循環或優化,直到隨機生成的數字滿足一個標准,盡管這可能需要無限的時間來滿足我的 10 個參數。
I have found optim in R which generates a data set, but have so far been able to increase the parameters in the R optim source code or duplicate it with Python scipy.optimize or similar, though these still depend on methods instead of directly psudo-根據我的需要,根據我的 10 個參數隨機創建一個數據集;
m0 <- 20
sd0 <- 5
min <- 1
max <- 45
n <- 15
set.seed(1)
mm <- min:max
x0 <- sample(mm, size=n, replace=TRUE)
objfun <- function(x) {(mean(x)-m0)^2+(sd(x)-sd0)^2}
candfun <- function(x) {x[sample(n, size=1)] <- sample(mm, size=1)
return(x)}
objfun(x0) ##INITIAL RESULT:83.93495
o1 <- optim(par=x0, fn=objfun, gr=candfun, method="SANN", control=list(maxit=1e6))
mean(o1$par) ##INITIAL RESULT:20
sd(o1$par) ##INITIAL RESULT:5
plot(table(o1$par))
根據分布生成隨機數的最一般方法如下:
numpy.random.random()
)。結果是一個服從分布的數字。
在您的情況下,逆 CDF ( ICDF(x)
)已經由您的五個參數確定——最小值、最大值和三個百分位數,如下所示:
因此,您已經對逆 CDF 的樣子有了一些了解。 您現在要做的就是以某種方式優化其他參數(均值、標准差、偏度和峰度)的逆 CDF。 例如,您可以在其他百分位數處“填寫”逆 CDF,並查看它們與您所追求的參數的匹配程度。 從這個意義上說,一個好的開始猜測是剛才提到的百分位數的線性插值。 要記住的另一件事是逆 CDF“永遠不能 go 下降”。
以下代碼顯示了一個解決方案。 它執行以下步驟:
_lossfunc
)、初始猜測、邊界和其他參數傳遞給 SciPy 的scipy.optimize.minimize
方法進行優化。import scipy.stats.mstats as mst
from scipy.optimize import minimize
from scipy.interpolate import interp1d
import numpy
# Define the loss function, which compares the calculated
# and ideal parameters
def _lossfunc(x, *args):
mean, stdev, skew, kurt, chunks = args
st = (
(numpy.mean(x) - mean) ** 2
+ (numpy.sqrt(numpy.var(x)) - stdev) ** 2
+ ((mst.skew(x) - skew)) ** 2
+ ((mst.kurtosis(x) - kurt)) ** 2
)
return st
def adjust(rx, percentiles):
eps = (max(rx) - min(rx)) / (3.0 * len(rx))
# Make result monotonic
for i in range(1, len(rx)):
if (
i - 2 >= 0
and rx[i - 2] < rx[i - 1]
and rx[i - 1] >= rx[i]
and rx[i - 2] < rx[i]
):
rx[i - 1] = (rx[i - 2] + rx[i]) / 2.0
elif rx[i - 1] >= rx[i]:
rx[i] = rx[i - 1] + eps
# Constrain to percentiles
for pi in range(1, len(percentiles)):
previ = percentiles[pi - 1][0]
prev = rx[previ]
curr = rx[percentiles[pi][0]]
prevideal = percentiles[pi - 1][1]
currideal = percentiles[pi][1]
realrange = max(eps, curr - prev)
idealrange = max(eps, currideal - prevideal)
for i in range(previ + 1, percentiles[pi][0]):
if rx[i] >= currideal or rx[i] <= prevideal:
rx[i] = (
prevideal
+ max(eps * (i - previ + 1 + 1), rx[i] - prev) * idealrange / realrange
)
rx[percentiles[pi][0]] = currideal
# Make monotonic again
for pi in range(1, len(percentiles)):
previ = percentiles[pi - 1][0]
curri = percentiles[pi][0]
for i in range(previ+1, curri+1):
if (
i - 2 >= 0
and rx[i - 2] < rx[i - 1]
and rx[i - 1] >= rx[i]
and rx[i - 2] < rx[i]
and i-1!=previ and i-1!=curri
):
rx[i - 1] = (rx[i - 2] + rx[i]) / 2.0
elif rx[i - 1] >= rx[i] and i!=curri:
rx[i] = rx[i - 1] + eps
return rx
# Calculates an inverse CDF for the given nine parameters.
def _get_inverse_cdf(mn, p25, p50, p75, mx, mean, stdev, skew, kurt, chunks=100):
if chunks < 0:
raise ValueError
# Minimum of 16 chunks
chunks = max(16, chunks)
# Round chunks up to closest multiple of 4
if chunks % 4 != 0:
chunks += 4 - (chunks % 4)
# Calculate initial guess for the inverse CDF; an
# interpolation of the inverse CDF through the known
# percentiles
interp = interp1d([0, 0.25, 0.5, 0.75, 1.0], [mn, p25, p50, p75, mx], kind="cubic")
rnge = mx - mn
x = interp(numpy.linspace(0, 1, chunks + 1))
# Bounds, taking percentiles into account
bounds = [(mn, mx) for i in range(chunks + 1)]
percentiles = [
[0, mn],
[int(chunks * 1 / 4), p25],
[int(chunks * 2 / 4), p50],
[int(chunks * 3 / 4), p75],
[int(chunks), mx],
]
for p in percentiles:
bounds[p[0]] = (p[1], p[1])
# Other parameters
otherParams = (mean, stdev, skew, kurt, chunks)
# Optimize the result for the given parameters
# using the initial guess and the bounds
result = minimize(
_lossfunc, # Loss function
x, # Initial guess
otherParams, # Arguments
bounds=bounds,
)
rx = result.x
if result.success:
adjust(rx, percentiles)
# Minimize again
result = minimize(
_lossfunc, # Loss function
rx, # Initial guess
otherParams, # Arguments
bounds=bounds,
)
rx = result.x
adjust(rx, percentiles)
# Minimize again
result = minimize(
_lossfunc, # Loss function
rx, # Initial guess
otherParams, # Arguments
bounds=bounds,
)
rx = result.x
# Calculate interpolating function of result
ls = numpy.linspace(0, 1, chunks + 1)
success = result.success
icdf=interp1d(ls, rx, kind="linear")
# == To check the quality of the result
if False:
meandiff = numpy.mean(rx) - mean
stdevdiff = numpy.sqrt(numpy.var(rx)) - stdev
print(meandiff)
print(stdevdiff)
print(mst.skew(rx)-skew)
print(mst.kurtosis(rx)-kurt)
print(icdf(0)-percentiles[0][1])
print(icdf(0.25)-percentiles[1][1])
print(icdf(0.5)-percentiles[2][1])
print(icdf(0.75)-percentiles[3][1])
print(icdf(1)-percentiles[4][1])
return (icdf, success)
def random_10params(n, mn, p25, p50, p75, mx, mean, stdev, skew, kurt):
""" Note: Kurtosis as used here is Fisher's kurtosis,
or kurtosis excess. Stdev is square root of numpy.var(). """
# Calculate inverse CDF
icdf, success = (None, False)
tries = 0
# Try up to 10 times to get a converging inverse CDF, increasing the mesh each time
chunks = 500
while tries < 10:
icdf, success = _get_inverse_cdf(mn, p25, p50, p75, mx, mean, stdev, skew, kurt,chunks=chunks)
tries+=1
chunks+=100
if success: break
if not success:
print("Warning: Estimation failed and may be inaccurate")
# Generate uniform random variables
npr=numpy.random.random(size=n)
# Transform them with the inverse CDF
return icdf(npr)
例子:
print(random_10params(n=1000, mn=39, p25=116, p50=147, p75=186, mx=401, mean=154.1207, stdev=52.3257, skew=.7083, kurt=.5383))
最后一點:如果您可以訪問基礎數據點,而不僅僅是它們的統計數據,那么您可以使用其他方法從這些數據點形成的分布中進行抽樣。 示例包括kernel 密度估計、直方圖或回歸模型(特別是對於時間序列數據)。 另請參見根據現有數據生成隨機數據。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.