[英]In Pandas dataframe, how to append a new column of True / False based on each row's value?
I'm trying to create a dataframe of stock prices, and append a True/False column for each row based on certain conditions.我正在尝试根据特定条件为每一行创建一个 dataframe 的股票价格和 append 一个 True/False 列。
ind = [0,1,2,3,4,5,6,7,8,9]
close = [10,20,30,40,30,20,30,40,50]
open = [11,21,31,41,31,21,31,41,51]
upper = [11,21,31,41,31,21,31,41,51]
mid = [11,21,31,41,31,21,31,41,51]
cond1 = [True,True,True,False,False,True,False,True,True]
cond2 = [True,True,False,False,False,False,False,False,False]
cond3 = [True,True,False,False,False,False,False,False,False]
cond4 = [True,True,False,False,False,False,False,False,False]
cond5 = [True,True,False,False,False,False,False,False,False]
def check_conds(df, latest_price):
''''1st set of INT for early breakout of bollinger upper'''
df.loc[:, ('cond1')] = df.close.shift(1) > df.upper.shift(1)
df.loc[:, ('cond2')] = df.open.shift(1) < df.mid.shift(1).rolling(6).min()
df.loc[:, ('cond3')] = df.close.shift(1).rolling(7).min() <= 21
df.loc[:, ('cond4')] = df.upper.shift(1) < df.upper.shift(2)
df.loc[:, ('cond5')] = df.mid.tail(3).max() < 30
df.loc[:, ('Overall')] = all([df.cond1,df.cond2,df.cond3,df.cond4,df.cond5])
return df
The original 9 rows by 4 columns dataframe contains only the close / open / upper / mid columns.原始的 9 行乘 4 列 dataframe 仅包含关闭/打开/上/中列。
that check_conds functions returns the df nicely with the new cond1-5 columns returning True / False appended for each row, resulting in a dataframe with 9 rows by 9 columns. check_conds 函数很好地返回 df,新的 cond1-5 列返回 True / False 为每一行附加,从而产生一个 dataframe 9 行乘 9 列。
However when I tried to apply another logic to provide an 'Overall' True / False based on cond1-5 for each row, I receive that "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."但是,当我尝试应用另一个逻辑来为每一行提供基于 cond1-5 的“总体”真/假时,我收到“ValueError:系列的真值不明确。使用 a.empty, a.bool( )、a.item()、a.any() 或 a.all()。”
df.loc[:, ('Overall')] = all([df.cond1,df.cond2,df.cond3,df.cond4,df.cond5])
So I tried pulling out each of the cond1-5, those are indeed series of True / False.所以我试着拔出每一个cond1-5,那些确实是真/假系列。 How do I have that last line in the function to check each row's cond1-5 and return a True if all cond1-5 are True for that row?
我如何在 function 中使用最后一行来检查每一行的 cond1-5 并在该行的所有 cond1-5 为 True 时返回 True?
Just can't wrap my head why those cond1-5 lines in the function works ok, just comparing the values within each row, but this above last line (written in similar style) is returning an entire series.只是无法理解为什么 function 中的 cond1-5 行可以正常工作,只是比较每一行中的值,但是上面最后一行(以类似的风格编写)正在返回整个系列。
Please advise!请指教!
The error tells you to use pd.DataFrame.all
.该错误告诉您使用
pd.DataFrame.all
。 To check that all values are true per row for all conditions you have to specify the argument axis=1
:要检查所有条件下每行的所有值是否为真,您必须指定参数
axis=1
:
df.loc[:, df.columns.str.startswith('cond')].all(axis=1)
Note that df.columns.str.startswith('cond')
is just a lazy way of selecting all columns that start with 'cond'
.请注意,
df.columns.str.startswith('cond')
只是选择所有以'cond'
开头的列的一种懒惰方式。 Of course you can achieve the same with df[['cond1', 'cond2', 'cond3', 'cond4', 'cond5']]
.当然,您可以使用
df[['cond1', 'cond2', 'cond3', 'cond4', 'cond5']]
实现相同的效果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.