根据数据创建时间序列

Question

I have a dataframe which contains information on defaults within a loan portfolio and the time from origination from which it occurred. 我有一个数据框，其中包含有关贷款组合内违约的信息以及从发生之日起的时间。 Each 'observation' is a pair representing time t in days, and amount of loan default: 每个“观察”都是一对，表示以天为单位的时间t和违约贷款的数量：

df['time_to_default']  #  Time from origination to default
df['default_amnt']     #  The loan amount defaulted

I would like to create a series which represents the cumulative amount of defaults for any given time t. 我想创建一个表示任何给定时间t的违约累积量的序列。 (Assume that time_to_default is evenly divisible by t). （假设time_to_default被t整除）。 I cannot figure out how to create a new dataframe element, assign the initial value to 0 and then iterate through the series.... 我无法弄清楚如何创建一个新的dataframe元素，将初始值分配为0，然后遍历该系列。

Answer 1

It sounds like you need to use groupby for this together with cumsum since you want a running total: 听起来您需要cumsum使用groupby和cumsum因为您希望获得总计：

cum_defaults = df.groupby('time_to_default').default_amnt.sum().cumsum()

You then need to reindex this new series to fill in any missing days: 然后，您需要为这个新系列重新编制索引，以填写任何遗失的日子：

cum_defaults = cum_defaults.reindex(index=range(min(cum_defaults.index),
                                                max(cum_defaults.index) + 1), 
                                    method='ffill')

With some example data: 带有一些示例数据：

df = pd.DataFrame({'time_to_default': [1, 3, 3, 6], 
                   'default_amnt': [10, 20, 30, 40]})
>>> cum_defaults
time_to_default
1     10
2     10
3     60
4     60
5     60
6    100
Name: default_amnt, dtype: int64

根据数据创建时间序列

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-06-05 07:35:33

根据数据创建时间序列

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-06-05 07:35:33

解决方案1
1 已采纳 2015-06-05 07:35:33