根据 df1 上的条件创建 pd 系列，并报告来自 df2 或 df3 的值

Question

First post here.第一次在这里发帖。 I'm new to Python, but have made alot of progress leveraging the answers posted here to others questions.我是 Python 的新手，但利用此处发布的其他问题的答案取得了很多进展。 Unfortunately i'm having trouble with what seems to be an easy task.不幸的是，我在处理看似简单的任务时遇到了麻烦。 I have 3 pandas series, indexed on dates我有 3 个熊猫系列，以日期为索引

df1 = {'signal': [0,0,1,1,0,0,1]}  #binary trading signal

df2 = {'SPX': [5,0,5,1,0,5,2]}     #S&P 500 returns

df3 = {'UST': [-1,1,1,0,1,-1,0]}   #10yr Treasury returns

I am trying to create a new series df4 that will represent the return profile of the trading signal.我正在尝试创建一个新系列 df4，它将代表交易信号的回报配置文件。 If the signal = 1, get the df3 value on that day, else give me the df2 value (which is for all the zeros)如果信号 = 1，则获取当天的 df3 值，否则给我 df2 值（所有零的值）

I've found plenty of posts regarding this topic, which seems very simple, but have struggled to make them work.我找到了很多关于这个主题的帖子，看起来很简单，但一直在努力使它们起作用。 I tried a simple if statement...我尝试了一个简单的 if 语句...

df4 = df1
    if df1 == 1:
        df4.replace(1, df3)
    else:
        df4.replace(0, df2)

But I get ValueError: The truth value of a Series is ambiguous.但是我得到了 ValueError：一个系列的真值是不明确的。 Use a.empty, a.bool(), a.item(), a.any() or a.all().使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。 If I add df1.any(), no change is made如果我添加 df1.any()，则不会进行任何更改

I've also tried and failed to use other solutions...我也尝试过但未能使用其他解决方案......

df4 = df1.apply(lambda x: df2 if x == 0 else df3, axis=1)

df4 = df1.loc[df1 == 1, df3] == df2

df4 = df1.select([df1 > 0], [df3], default=df2)

One thing i'm concerned about is that if I replace all the 1s in df4 with a return from df3 and at some point it just so happens the value is a 0... then if I do a second replace for all the 0s in df4, I may place a 0 that should be left along.我担心的一件事是，如果我将 df4 中的所有 1 替换为 df3 的返回值，并且在某些时候它恰好是 0 ......那么如果我对所有 0 进行第二次替换df4，我可能会放置一个应该留下的 0。

Any help to educate me on the most efficient way to do this is very much appreciated.非常感谢任何帮助我了解最有效的方法来做到这一点的帮助。

Answer 1

use Series.where() , specify the column names.使用Series.where() ，指定列名。

see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html见https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html

>>> df3.where(df1.signal == 1, other=df2.SPX, axis=0)
  UST
0   5
1   0
2   1
3   0
4   0
5   5
6   0

Answer 2

Using numpy.where with the DataFrame values将numpy.where与 DataFrame 值一起使用

df1 = pd.DataFrame({'signal': [0,0,1,1,0,0,1]})  #binary trading signal
df2 = pd.DataFrame({'SPX': [5,0,5,1,0,5,2]})     #S&P 500 returns
df3 = pd.DataFrame({'UST': [-1,1,1,0,1,-1,0]})   #10yr Treasury returns

data = np.where(df1.values,df3.values,df2.values)
df4 = pd.DataFrame(data)
#df4 = pd.DataFrame(np.where(df1.values,df3.values,df2.values))

If the DataFrames actually have more columns you would need to specify - .values isn't actually necessary如果 DataFrames 实际上有更多的列，您需要指定 - .values实际上不是必需的

pd.DataFrame(np.where(df1['signal'],df3['UST'],df2['SPX']))
# or
pd.DataFrame(np.where(df1.signal,df3.UST,df2.SPX))

Using numpy.where is pretty fast compared to DataFrame.where与DataFrame.where相比，使用numpy.where非常快

根据 df1 上的条件创建 pd 系列，并报告来自 df2 或 df3 的值

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-02-07 16:23:07

解决方案2
0 2020-02-07 16:24:19

根据 df1 上的条件创建 pd 系列，并报告来自 df2 或 df3 的值

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-02-07 16:23:07

解决方案2 0 2020-02-07 16:24:19

解决方案1
1 已采纳 2020-02-07 16:23:07

解决方案2
0 2020-02-07 16:24:19