[英]Iteratively testing the performance of portfolio weights in pandas
我有x 个(假设 x = 8)股票因子的投资组合,权重为w 。 我可以使用 function F(w) = p以均等加权的方式测试这些因素的性能,其中 output p是性能指标。 第一个测试的投资组合权重 df 如下所示:
weight
0 12.57
1 12.49
2 12.49
3 12.49
4 12.49
5 12.49
6 12.49
7 12.49
在我使用 100/8 四舍五入到两位小数的地方,给第一个因素加上余数,如下所示:
pd.DataFrame(columns = ['weight'], data = [(100/factorsdf.shape[0] // 0.01 / 100)]*factorsdf.shape[0])
weights['weight'].iloc[0] = weights['weight'].iloc[0] + round(100 - (100/factorsdf.shape[0] // 0.01 / 100) * factorsdf.shape[0],2)
我现在要做的是迭代改进这些权重。 我想这样做的方法是逐渐减去底部权重(索引 7)的一半权重(此处:12.49/2 = 6.245)并将其添加到上方权重(索引 6)。
incremental_weight = (100/factorsdf.shape[0] // 0.01 / 100)/2
因此,在第一次迭代中,我将测试同等权重的投资组合。
在下一次迭代中,我将测试:
weight
0 12.57
1 12.49
2 12.49
3 12.49
4 12.49
5 12.49
6 18.735
7 6.245
如果这不能改善指标p ,我将尝试将其添加到 dataframe 中的下一个权重元素,如下所示:
weight
0 12.57
1 12.49
2 12.49
3 12.49
4 12.49
5 18.735
6 12.49
7 6.245
如果确实有所改善,我将再减去底部元素的 6.245,然后尝试将其添加到上面的元素中。 如果这不能改善p ,我会尝试将它添加到上面的那个,依此类推,直到我尝试将它添加到所有其他元素。
之后,我将对底部元素上方的元素(索引为 6)执行相同的过程,减去 6.245 并将其迭代添加到其他元素。 等等。 直到不能再改进度量p 。
以相对有效的方式在 pandas 中对此进行编程的好方法是什么?
如果我完全理解以下内容(我在代码中添加了注释),则应该可以工作:
import pandas as pd; import numpy as np
# my random generated factor scores
factorsdf = pd.DataFrame({"Factor": np.random.randint(0, 100, 8)})
# weights evenly allocated (from your code)
weights = pd.DataFrame(columns = ['weight'], data = [(100/factorsdf.shape[0] // 0.01 / 100)]*factorsdf.shape[0])
weights['weight'].iloc[0] = weights['weight'].iloc[0] + round(100 - (100/factorsdf.shape[0] // 0.01 / 100) * factorsdf.shape[0],2)
# weight to increment by (from your code)
incremental_weight = (100/factorsdf.shape[0] // 0.01 / 100)/2
# the F(w) function: weights must be first, from np.apply_along_axis below
def F_w(weights, factorsdf):
# here it is just the dot product of the random factors and the weights.
p = np.dot(factorsdf, weights.T/100)
return p
# initial value of p
p = F_w(np.array(weights["weight"]), np.array(factorsdf["Factor"]))
print("p: ", p)
# rows above the weight that is being decreased
i = len(weights) - 1
# copy of weights to change
weights_c = weights.copy(deep=True)
# can't decrease the top weight, because none above to change
while i >= 1:
print("i: ", i)
# series with 0s everywhere except on row that will be decreasing the weight
dec = pd.Series([0]*(len(weights_c)-1) + [incremental_weight]).shift(-len(weights)+1+i).fillna(0)
# increment array has all possible options for increasing weights above
inc = np.identity(i)*incremental_weight
# row that is decreasing should not increase
inc = np.vstack([inc, [0]*inc.shape[1]])
# if other rows below have a weight (are not 0) then these won't increase either
while inc.shape[0] < len(weights_c):
inc = np.vstack([inc, [0]*inc.shape[1]])
# if weight will be below 0 then should move up, also if there is only one row left
if ((weights_c["weight"].sub(dec)<0).sum() > 0) or (inc.shape[0] == 1):
i -= 1
continue
# subtract the increment from the weight
weights_c["weight"] = weights_c["weight"].sub(dec)
# array of possible weightings when incrementing rows above
weights_arr = np.repeat(np.array(weights_c), inc.shape[1], axis=1) + inc
print("weights_arr: ", weights_arr)
print("weights_c: ", weights_c)
print("factorsdf: ", np.array(factorsdf).T[:, :weights_arr.shape[0]])
# return the p values for F(w) for all the possible variations
res = np.apply_along_axis(F_w, 0, weights_arr, np.array(factorsdf).T[:, :weights_arr.shape[0]])
# if any of these are greater than current p...
if res.max() > p:
# set new value of p
p = res.max()
# find which row the increment should be added to
x = res.argmax()
# add the increment to the correct row
weights_c["weight"].iloc[x] += incremental_weight
print("x: ", x)
# if not, move on
else:
weights_c["weight"] = weights_c["weight"].add(dec)
i -= 1
# if the bottom weight is 0, remove this row from calculations
if weights_c["weight"].iloc[-1] == 0:
weights_c = weights_c.iloc[:-1]
i -= 1
print("weights_c: ", weights_c)
# print final weightings and max value of p
print("weights_c: ", weights_c)
print("p: ", p)
它肯定可以稍微清理一下,删除一些 while 循环,但这可能会给你一个很好的起点。
如您所见,我使用的 numpy 比 pandas 多得多,如果权重和因子是 arrays 开始,代码会更少。
如果您有任何问题,请告诉我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.