How can the Weibull PDF parameters be correctly determined from a series of measurements?

Question

Assuming I have a series of hourly measured values, such as the mean wind speed. A start and end date is used to limit the data in terms of time. From these data I can calculate the frequency of the values for individual categories. The first category includes all values between 0 and < 0.5 km/h. The second all values between 0.5 and < 1.5 km/h, the third all values between 1.5 and < 2.5 km/h and so on. Counting all values results in the following total distribution:

Category    Amount  Frequency (in %)
0-1 km/h    42      0.64
1-2 km/h    444     6.78
2-3 km/h    871     13.30
3-4 km/h    1130    17.25
4-5 km/h    1119    17.08
5-6 km/h    934     14.26
6-7 km/h    703     10.73
7-8 km/h    490     7.48
8-9 km/h    351     5.36
9-10 km/    219     3.34
10-11km/h   143     2.18
11-12 km/h  52      0.79
12-13 km/h  13      0.20
13-14 km/h  15      0.23
14-15 km/h  6       0.09
15-16 km/h  6       0.09
16-17 km/h  4       0.06
17-18 km/h  3       0.05
18-19 km/h  4       0.06
20-21 km/h  2       0.03

How can the Weibull scaling factor and the Weibull shape factor be determined from these values (eg with python, reliability (?))?

So far I have only passed all individual values from the measurement series to python reliability (Fit_Weibull_2P) and thus determined the two parameters. However, the determined parameters do not seem to be correct (the curve is drawn incorrectly later) or I do not pass the values correctly to Fit_Weibull_2P.

Does anyone have an idea where I have an error or how it can be solved differently? Maybe not with the individual values, but with the frequency?

Answer 1

I don't know what your sample data is, but this gets pretty good approximation even using the binned data. Compare (1) without using floc=0 with (2) specifying floc=0 to force the left boundary to be at 0.

import numpy as np
from scipy.stats import weibull_min

x=np.concatenate((np.repeat(.25,42), np.repeat(1, 444), np.repeat(2, 871), np.repeat(3, 1130),
            np.repeat(4, 1119), np.repeat(5, 934), np.repeat(6, 703),
            np.repeat(7, 490), np.repeat(8, 351), np.repeat(9, 219),
            np.repeat(10, 143), np.repeat(11, 52), np.repeat(12, 13),
            np.repeat(13, 15), np.repeat(14, 6), np.repeat(15, 6),
            np.repeat(16, 4), np.repeat(17, 3), np.repeat(18, 4), [20,20]))

print(weibull_min.fit(x)) #1
(1.8742154858771933, 0.13126151114447493, 4.99670007482597)

print(weibull_min.fit(x, floc=0)) #2
(1.9446899445880135, 0, 5.155845183708194)

Answer 2

This may or may not help you, but here is how you could do it in R.

text="
Category    Amount  'Frequency (in %)'
'0-1 km/h'    42      0.64
'1-2 km/h'    444     6.78
'2-3 km/h'    871     13.30
'3-4 km/h'    1130    17.25
'4-5 km/h'    1119    17.08
'5-6 km/h'    934     14.26
'6-7 km/h'    703     10.73
'7-8 km/h'    490     7.48
'8-9 km/h'    351     5.36
'9-10 km/h'    219     3.34
'10-11km/h'   143     2.18
'11-12 km/h'  52      0.79
'12-13 km/h'  13      0.20
'13-14 km/h'  15      0.23
'14-15 km/h'  6       0.09
'15-16 km/h'  6       0.09
'16-17 km/h'  4       0.06
'17-18 km/h'  3       0.05
'18-19 km/h'  4       0.06
'20-21 km/h'  2       0.03
"
df=read.table(text=text, header=TRUE)
left=c(0)
right=c(.5)
for (i in 2:20) {
  left[i]=i-2+.5
  right[i]=i-1+.5
}
df1=mutate(df, left=left, right=right)
library(tidyr)
df1=uncount(df1, Amount)
bins=select(df1, left, right)
fitdistcens(bins, "weibull")

Fitting of the distribution ' weibull ' on censored data by maximum likelihood 
Parameters:
      estimate
shape 1.953459
scale 5.152375

Answer 3

This is a case of interval censored data. That is, the data point is not exactly known, but is known have occurred in some window.

The python package surpyval , found here (I am it's author), is a good way to do this.

import surpyval as surv

# count vector
n = [42, 444, 871, 1130, 1119, 934, 703, 490, 351, 219, 143, 52, 13, 15, 6, 6, 4, 3, 4, 2]
# interval vector
x = [[l, u] for l, u in zip(range(1, 21), range(2, 22))]

model = surv.Weibull.fit(x=x, n=n)
model

Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Parameters          :
     alpha: 6.8296115421888715
      beta: 2.6063921449099317

It also appears that your data is actually right-truncated. That is, you have no observations above 21. This can also be added to the estimate.


model = surv.Weibull.fit(x=x, n=n, tr=21)
model

Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Parameters          :
     alpha: 6.829611684656533
      beta: 2.606391638781735

although this doesn't change the answer.

How can the Weibull PDF parameters be correctly determined from a series of measurements?

Question

3 answers

solution1
1 ACCPTED 2021-03-08 15:06:46

solution2
0 2021-03-06 07:13:44

solution3
0 2021-12-13 23:53:04

How can the Weibull PDF parameters be correctly determined from a series of measurements?

Question

3 answers

solution1 1 ACCPTED 2021-03-08 15:06:46

solution2 0 2021-03-06 07:13:44

solution3 0 2021-12-13 23:53:04

solution1
1 ACCPTED 2021-03-08 15:06:46

solution2
0 2021-03-06 07:13:44

solution3
0 2021-12-13 23:53:04