[英]Create pd series based on conditions on df1, and reporting values from df2 or df3
First post here.第一次在这里发帖。 I'm new to Python, but have made alot of progress leveraging the answers posted here to others questions.我是 Python 的新手,但利用此处发布的其他问题的答案取得了很多进展。 Unfortunately i'm having trouble with what seems to be an easy task.不幸的是,我在处理看似简单的任务时遇到了麻烦。 I have 3 pandas series, indexed on dates我有 3 个熊猫系列,以日期为索引
df1 = {'signal': [0,0,1,1,0,0,1]} #binary trading signal
df2 = {'SPX': [5,0,5,1,0,5,2]} #S&P 500 returns
df3 = {'UST': [-1,1,1,0,1,-1,0]} #10yr Treasury returns
I am trying to create a new series df4 that will represent the return profile of the trading signal.我正在尝试创建一个新系列 df4,它将代表交易信号的回报配置文件。 If the signal = 1, get the df3 value on that day, else give me the df2 value (which is for all the zeros)如果信号 = 1,则获取当天的 df3 值,否则给我 df2 值(所有零的值)
I've found plenty of posts regarding this topic, which seems very simple, but have struggled to make them work.我找到了很多关于这个主题的帖子,看起来很简单,但一直在努力使它们起作用。 I tried a simple if statement...我尝试了一个简单的 if 语句...
df4 = df1
if df1 == 1:
df4.replace(1, df3)
else:
df4.replace(0, df2)
But I get ValueError: The truth value of a Series is ambiguous.但是我得到了 ValueError:一个系列的真值是不明确的。 Use a.empty, a.bool(), a.item(), a.any() or a.all().使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。 If I add df1.any(), no change is made如果我添加 df1.any(),则不会进行任何更改
I've also tried and failed to use other solutions...我也尝试过但未能使用其他解决方案......
df4 = df1.apply(lambda x: df2 if x == 0 else df3, axis=1)
df4 = df1.loc[df1 == 1, df3] == df2
df4 = df1.select([df1 > 0], [df3], default=df2)
One thing i'm concerned about is that if I replace all the 1s in df4 with a return from df3 and at some point it just so happens the value is a 0... then if I do a second replace for all the 0s in df4, I may place a 0 that should be left along.我担心的一件事是,如果我将 df4 中的所有 1 替换为 df3 的返回值,并且在某些时候它恰好是 0 ......那么如果我对所有 0 进行第二次替换df4,我可能会放置一个应该留下的 0。
Any help to educate me on the most efficient way to do this is very much appreciated.非常感谢任何帮助我了解最有效的方法来做到这一点的帮助。
use Series.where()
, specify the column names.使用Series.where()
,指定列名。
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html见https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html
>>> df3.where(df1.signal == 1, other=df2.SPX, axis=0)
UST
0 5
1 0
2 1
3 0
4 0
5 5
6 0
Using numpy.where with the DataFrame values将numpy.where与 DataFrame 值一起使用
df1 = pd.DataFrame({'signal': [0,0,1,1,0,0,1]}) #binary trading signal
df2 = pd.DataFrame({'SPX': [5,0,5,1,0,5,2]}) #S&P 500 returns
df3 = pd.DataFrame({'UST': [-1,1,1,0,1,-1,0]}) #10yr Treasury returns
data = np.where(df1.values,df3.values,df2.values)
df4 = pd.DataFrame(data)
#df4 = pd.DataFrame(np.where(df1.values,df3.values,df2.values))
If the DataFrames actually have more columns you would need to specify - .values
isn't actually necessary如果 DataFrames 实际上有更多的列,您需要指定 - .values
实际上不是必需的
pd.DataFrame(np.where(df1['signal'],df3['UST'],df2['SPX']))
# or
pd.DataFrame(np.where(df1.signal,df3.UST,df2.SPX))
Using numpy.where
is pretty fast compared to DataFrame.where
与DataFrame.where
相比,使用numpy.where
非常快
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.