简体   繁体   中英

Calculate Statistical Metrics On Time-Series Data

I have a dataframe that looks like this:

        ID          DATE       QTD
0       71896517    2020-07-25  1
1       71896517    2020-09-14  2
2       72837949    2020-09-21  1
3       72848188    2020-11-03  1
4       73307986    2020-11-04  1
5       72848188    2020-11-16  1
6       71896517    2020-11-22  1
7       73307986    2020-11-25  1
8       73307986    2021-01-04  1
9       73307986    2021-02-04  1
10      72848188    2021-02-07  1
11      72837949    2021-02-11  1

I want to add a column to the dataframe for the monthly mean of the QTD column. This metrics should be calculated for each ID individually (so every ID has its own values) and should incremement as the dataframe grows in DATE (which is the year-month-day). If the ID doesn't have activity in a certain month, I want that month to still be included on the calculated metrics.

I want the end dataframe to look like this:

        ID          DATE       QTD    MEAN
0       71896517    2020-07-25  1     1.0
1       71896517    2020-09-14  2     1.0
2       72837949    2020-09-21  1     1.0
3       72848188    2020-11-03  1     1.0
4       73307986    2020-11-04  1     1.0
5       72848188    2020-11-16  1     2.0
6       71896517    2020-11-22  1     0.8
7       73307986    2020-11-25  1     2.0
8       73307986    2021-01-04  1     1.0
9       73307986    2021-02-04  1     1.0
10      72848188    2021-02-07  1     0.75
11      72837949    2021-02-11  1     0.33

How can I go about to implement this?

There's usually an iterative solution.

import pandas as pd
import numpy as np

def monthdelta(a,b):
    a1,a2,a3 = (int(k) for k in a.split('-'))
    b1,b2,b3 = (int(k) for k in b.split('-'))
    return (a1*12+a2) - (b1*12+b2)

data = [
[ 71896517, "2020-07-25", 1 ],
[ 71896517, "2020-09-14", 2 ],
[ 72837949, "2020-09-21", 1 ],
[ 72848188, "2020-11-03", 1 ],
[ 73307986, "2020-11-04", 1 ],
[ 72848188, "2020-11-16", 1 ],
[ 71896517, "2020-11-22", 1 ],
[ 73307986, "2020-11-25", 1 ],
[ 73307986, "2021-01-04", 1 ],
[ 73307986, "2021-02-04", 1 ],
[ 72848188, "2021-02-07", 1 ],
[ 72837949, "2021-02-11", 1 ],
]

df = pd.DataFrame( data, columns=["ID", "DATE", "QTD"] )

startdate = {}
sums = {}
sumsqs = {}
num = {}
stdev = []
means = []

for row in df.T.iteritems():
    id = row[1]['ID']
    if id not in startdate:
        num[id] = 1
        startdate[id] = row[1]['DATE']
        sums[id] = row[1]['QTD']
        sumsqs[id] = row[1]['QTD'] * row[1]['QTD']
        means.append( row[1]['QTD'] )
        stdev.append( 0 )
    else:
        num[id] += 1
        sums[id] += row[1]['QTD']
        sumsqs[id] += row[1]['QTD'] * row[1]['QTD']
        delta = monthdelta(row[1]['DATE'],startdate[id]) + 1
        means.append( sums[id] / delta )
        if delta == 1:
            stdev.append( 0 )
        else:
            stdev.append( np.sqrt((delta*sumsqs[id] - sums[id]*sums[id])/delta))

print( means )
df['MEAN'] = pd.Series(means)
print( stdev )
df['STDEV'] = pd.Series(stdev)
print( df )

Output:

[1, 1.0, 1, 1, 1, 2.0, 0.8, 2.0, 1.0, 1.0, 0.75, 0.3333333333333333]
[0, 1.4142135623730951, 0, 0, 0, 0, 1.6733200530681511, 0, 0.0, 0.0, 0.8660254037844386, 1.1547005383792515]
          ID        DATE  QTD      MEAN     STDEV
0   71896517  2020-07-25    1  1.000000  0.000000
1   71896517  2020-09-14    2  1.000000  1.414214
2   72837949  2020-09-21    1  1.000000  0.000000
3   72848188  2020-11-03    1  1.000000  0.000000
4   73307986  2020-11-04    1  1.000000  0.000000
5   72848188  2020-11-16    1  2.000000  0.000000
6   71896517  2020-11-22    1  0.800000  1.673320
7   73307986  2020-11-25    1  2.000000  0.000000
8   73307986  2021-01-04    1  1.000000  0.000000
9   73307986  2021-02-04    1  1.000000  0.000000
10  72848188  2021-02-07    1  0.750000  0.866025
11  72837949  2021-02-11    1  0.333333  1.154701

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM