在 numpy / pandas 中迭代的更快方法？

Question

I have a big portfolio of bonds and I want to create a table with days as index, the bonds as columns and the notional of the bonds as values.我有一个很大的债券组合，我想创建一个表格，其中天数作为索引，债券作为列，债券的名义作为价值。

I need to put at 0 the rows before the starting date and after the terminating date of each bond.我需要将每个债券的开始日期之前和终止日期之后的行设为 0。

Is there a more efficient way than this:有没有比这更有效的方法：

[[np.where( (day>=bonds.inception[i]) & 
(day + relativedelta(months=+m) >= bonds.maturity[i] ) & 
(day <= bonds.maturity[i]), 

bonds.principal[i],

0)   

for i in range(bonds.shape[0])] for day in idx_d]

input example:输入示例：

id ID	nom名义	inception开始	maturity到期
38 38	200 200	22/04/2022 22/04/2022	22/04/2032 22/04/2032
87 87	100 100	22/04/2022 22/04/2022	22/04/2052 22/04/2052

output example: output 示例：

day天	38 38	87 87
21/04/2022 21/04/2022	0 0	0 0
22/04/2022 22/04/2022	100 100	200 200

Answer 1

The solution below still requires a loop.下面的解决方案仍然需要一个循环。 I don't know if it's faster, or whether you find it clear, but I'll offer it as an alternative.我不知道它是否更快，或者您是否发现它很清楚，但我会提供它作为替代方案。

Create an example dataframe (with a few extra bonds for demonstration purposes):创建一个示例 dataframe（带有一些额外的债券用于演示目的）：

import pandas as pd

df = pd.DataFrame({'id': [38, 87, 49, 51, 89], 
                   'nom': [200, 100, 150, 50, 250],
                   'start_date': ['22/04/2022', '22/04/2022', '01/01/2022', '01/05/2022', '23/04/2012'],
                   'end_date': ['22/04/2032', '22/04/2052', '01/01/2042', '01/05/2042', '23/04/2022']})
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
df = df.set_index('id')
print(df)

This then looks like:这看起来像：

id ID	nom名义	start_date开始日期	end_date结束日期
38 38	200 200	2022-04-22 00:00:00 2022-04-22 00:00:00	2032-04-22 00:00:00 2032-04-22 00:00:00
87 87	100 100	2022-04-22 00:00:00 2022-04-22 00:00:00	2052-04-22 00:00:00 2052-04-22 00:00:00
49 49	150 150	2022-01-01 00:00:00 2022-01-01 00:00:00	2042-01-01 00:00:00 2042-01-01 00:00:00
51 51	50 50	2022-01-05 00:00:00 2022-01-05 00:00:00	2042-01-05 00:00:00 2042-01-05 00:00:00
89 89	250 250	2012-04-23 00:00:00 2012-04-23 00:00:00	2022-04-23 00:00:00 2022-04-23 00:00:00

Now, create a new blank dataframe, with 0 as the default value:现在，创建一个新的空白 dataframe，默认值为 0：

new = pd.DataFrame(data=0, columns=df.index, index=pd.date_range('2022-04-20', '2062-04-22'))
new.index.rename('day', inplace=True)

Then, iterate over the columns (or index of the original dataframe), selecting the relevant interval and set the column value to the relevant 'nom' for that selected interval:然后，遍历列（或原始数据帧的索引），选择相关间隔并将列值设置为该选定间隔的相关“nom”：

for column in new.columns:
    sel = (new.index >= df.loc[column, 'start_date']) & (new.index <= df.loc[column, 'end_date'])
    new.loc[sel, column] = df.loc[df.index == column, 'nom'].values
print(new)

which results in:结果是：

day天	38 38	87 87	49 49	51 51	89 89
2022-04-20 00:00:00 2022-04-20 00:00:00	0 0	0 0	150 150	50 50	250 250
2022-04-21 00:00:00 2022-04-21 00:00:00	0 0	0 0	150 150	50 50	250 250
2022-04-22 00:00:00 2022-04-22 00:00:00	200 200	100 100	150 150	50 50	250 250
2022-04-23 00:00:00 2022-04-23 00:00:00	200 200	100 100	150 150	50 50	250 250
2022-04-24 00:00:00 2022-04-24 00:00:00	200 200	100 100	150 150	50 50	0 0
... ...
2062-04-21 00:00:00 2062-04-21 00:00:00	0 0	0 0	0 0	0 0	0 0
2062-04-22 00:00:00 2062-04-22 00:00:00	0 0	0 0	0 0	0 0	0 0

[14613 rows x 5 columns] [14613 行 x 5 列]

在 numpy / pandas 中迭代的更快方法？

问题描述

input example:输入示例：

output example: output 示例：

1 个解决方案

解决方案1
1 已采纳 2022-03-12 09:12:49

在 numpy / pandas 中迭代的更快方法？

问题描述

input example:输入示例：

output example: output 示例：

1 个解决方案

解决方案1 1 已采纳 2022-03-12 09:12:49

解决方案1
1 已采纳 2022-03-12 09:12:49