简体   繁体   中英

How to get the Cumulative percentile value of each row in a Python time series

How do I get the cumulative percentile value?

Dates
1990-01-02    17.24
1990-01-03    18.19
1990-01-04    19.22
1990-01-05    20.11
1990-01-08    20.26
1990-01-09    22.20
1990-01-10    22.44
1990-01-11    20.05
1990-01-12    24.64
1990-01-15    26.34
1990-01-16    24.18

The percentile value of the 2nd row within 2 rows of data and the percentile value of the 3rd row within 3 rows of data and so forth?

You can do something like this:

import pandas as pd
import numpy as np

df=pd.read_excel('filename.xlsx') #replace filename with name of your excel file

df['date']=pd.to_datetime(df['date']) #this doesn't affect your percentile calculation but you do it to leverage full power of pandas datetime functions

val_list=df.val.values
vals=[]
perc=[]

for r in range(len(val_list)):
    l=[x for x in val_list[0:r+1]]
    vals.append(l)

for value in vals:
    perc.append(np.percentile(value,50)) #change 50 to the percentile you want to calculate
df['percentile']=perc

print(df)

A few key points to note here:

1) I have done the calculation via importing your data as pandas DF. If you want to do it in a numpy array itself a few tweaks to the above code should do it. But, Pandas DF is an elegant way to look at tabular data in Python.

2) This might not be the most efficient way, but it gets the job done. So, use it carefully on very big datasets.

3) Study the comments mentioned in the code.

Hope this helps. If it doesn't reply in the comments below I'll try to sort it out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM