简体繁体 English

Pandas/Python 中的矢量化回测器：循环遍历每只股票作为一个新的数据帧，还是将它们全部放在一个数据帧中？

[英]Vectorized Backtester in Pandas/Python: Loop through each stock as a new dataframe or put it all in one dataframe?

原文 2021-06-30 08:26:54 4 1 python/ pandas/ dataframe/ loops/ finance

I've been trying to build my own simple vectorized backtester in Pandas/Python to create a simple way to test some trading strategies.我一直在尝试在 Pandas/Python 中构建自己的简单矢量化回测器，以创建一种简单的方法来测试一些交易策略。 I have been using this article as a guide and it has been pretty helpful.我一直在使用这篇文章作为指南，它非常有帮助。

I want to perform a simple portfolio backtest of say 10 stocks/ETFs.我想对 10 只股票/ETF 进行简单的投资组合回测。 For each stock I will have a dataframe which will have a date as a row index and the columns will be the Open, High, Low, Close prices for that date (financial time series data).对于每只股票，我将有一个数据框，其中将日期作为行索引，列将是该日期的开盘价、最高价、最低价、收盘价（金融时间序列数据）。 So I will have say 10 of these dataframes that will 4 columns each.所以我会说这些数据帧中的 10 个，每个数据帧有 4 列。 What would be the most pythonic and efficient way to do the backtest:进行回测的最pythonic和最有效的方法是什么：

Work on each dataframe separately, by looping through and carrying out my calculations on each dataframe then summing the profits at the end.分别处理每个数据帧，通过循环并在每个数据帧上执行我的计算，然后在最后总结利润。

OR或者

Concatenating all the dataframes together and just working on the one dataframe将所有数据帧连接在一起，只处理一个数据帧

In the example article I have been using, he works with just one dataframe, but he just uses Close price, so when he does this he doesn't need a column multi-index.在我一直使用的示例文章中，他只使用一个数据框，但他只使用收盘价，所以当他这样做时，他不需要列多索引。 I would need a column multi-index (level 0 is the stock name, level 1 is the Close, Open, High, Low, etc) and given my beginner pandas status, that's making things complicated for me.我需要一个列多索引（0 级是股票名称，1 级是收盘价、开盘价、最高价、最低价等），并且考虑到我的初学者熊猫状态，这让我的事情变得复杂。 I've been thinking it would be easier for me to create a loop and work with 10 separate dataframes, but I'm wondering if this is just lazy and will hinder my development in the long run.我一直认为创建一个循环并使用 10 个单独的数据帧会更容易，但我想知道这是否只是懒惰，从长远来看会阻碍我的发展。

1 个解决方案

A df of closes is the simplest.关闭的 df 是最简单的。 You need a multiindex https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html to use the other fields.您需要一个多索引https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html才能使用其他字段。 The issue I found with multiindex is that adding columns to it requires some hacking of the df every change.我在 multiindex 中发现的问题是，向其中添加列需要对 df 每次更改进行一些修改。