[英]How to iterate through two columns in a pandas dataframe to add the values to a list
i'm trying to evaluate a condition in one pandas column, and depending on the condition, take the value from another pandas column and append it to a list.我正在尝试评估一个 pandas 列中的条件,并根据条件从另一个 pandas 列和 append 中获取值到列表中。
I tried the following:我尝试了以下方法:
def roc_table(df, row_count, signal, returns):
"""
Parameters
----------
df : dataframe
row_count : length of data
signal : signal/s
returns : log returns
Returns
-------
table - hopefully
"""
df = df.copy()
bins = [-48.13,-38.70, -29.28, -19.85, -10.42, -1.01,
8.42, 17.85, 27.27, 36.7]
win_above = 0
lose_above = 0
lose_below = 0
win_below = 0
# df = df.sort_values([signal, returns])
for bin in bins:
k = bin
for row, value in df.iterrows():
if row[signal] < k:
lose_below += row[returns]
else:
win_below -= row[returns]
for row, value in df.iterrows():
if row[signal] >= k:
win_above += row[returns]
else:
lose_above -= row[returns]
print(win_above, lose_above, lose_below, win_below)
roc_table(df = df_train, row_count = df_train.shape[0],
signal = 'predicted_RSI_indicator',
returns = 'log_return')
But I only get但我只得到
Traceback (most recent call last):
File "<ipython-input-135-cd5513bb0778>", line 50, in <module>
roc_table(df = df_train, row_count = df_train.shape[0],
File "<ipython-input-135-cd5513bb0778>", line 32, in roc_table
if row[signal] < k:
TypeError: 'Timestamp' object is not subscriptable
The index is a date time stamp.索引是日期时间戳。
Here is a sample of the input df这是输入df的示例
signal returns
-.23 .045
2.3 -.09
9.8 1.2
The output would look something like this output 看起来像这样
bins win_above lose_above win_below lose_below
-48.13 123
-38.70 -98
-29.28 100
-19.85 -34
-10.42 567
...
So the idea is if df[singal]
is below the bin, that associated return, if greater than 0, is added to win_below, else it's added to lose_below.所以想法是,如果
df[singal]
低于 bin,则相关的返回值(如果大于 0)被添加到 win_below,否则它被添加到 loss_below。
I'll eventually add a loop for those signals greater than the bin and add those to win_above, lose_above.我最终会为那些大于 bin 的信号添加一个循环,并将它们添加到 win_above、loose_above。
As per Pandas documentation, pandas.DataFrame.iterrows yields "the index of the row and the data of the row as a Series" .根据 Pandas 文档, pandas.DataFrame.iterrows产生“行的索引和行的数据作为系列” 。
So, you should be doing (twice in you for loop):所以,你应该这样做(在你的 for 循环中两次):
for i, row in df.iterrows():
...
instead of:代替:
for row, value in df.iterrows():
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.