[英]Create a time series from data
I have a dataframe which contains information on defaults within a loan portfolio and the time from origination from which it occurred. 我有一个数据框,其中包含有关贷款组合内违约的信息以及从发生之日起的时间。 Each 'observation' is a pair representing time t in days, and amount of loan default:
每个“观察”都是一对,表示以天为单位的时间t和违约贷款的数量:
df['time_to_default'] # Time from origination to default
df['default_amnt'] # The loan amount defaulted
I would like to create a series which represents the cumulative amount of defaults for any given time t. 我想创建一个表示任何给定时间t的违约累积量的序列。 (Assume that time_to_default is evenly divisible by t).
(假设time_to_default被t整除)。 I cannot figure out how to create a new dataframe element, assign the initial value to 0 and then iterate through the series....
我无法弄清楚如何创建一个新的dataframe元素,将初始值分配为0,然后遍历该系列。
It sounds like you need to use groupby
for this together with cumsum
since you want a running total: 听起来您需要
cumsum
使用groupby
和cumsum
因为您希望获得总计:
cum_defaults = df.groupby('time_to_default').default_amnt.sum().cumsum()
You then need to reindex this new series to fill in any missing days: 然后,您需要为这个新系列重新编制索引,以填写任何遗失的日子:
cum_defaults = cum_defaults.reindex(index=range(min(cum_defaults.index),
max(cum_defaults.index) + 1),
method='ffill')
With some example data: 带有一些示例数据:
df = pd.DataFrame({'time_to_default': [1, 3, 3, 6],
'default_amnt': [10, 20, 30, 40]})
>>> cum_defaults
time_to_default
1 10
2 10
3 60
4 60
5 60
6 100
Name: default_amnt, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.