简体   繁体   English

如何遍历 pandas dataframe 中的两列以将值添加到列表中

[英]How to iterate through two columns in a pandas dataframe to add the values to a list

i'm trying to evaluate a condition in one pandas column, and depending on the condition, take the value from another pandas column and append it to a list.我正在尝试评估一个 pandas 列中的条件,并根据条件从另一个 pandas 列和 append 中获取值到列表中。

I tried the following:我尝试了以下方法:

    def roc_table(df, row_count, signal, returns):
    """
    

    Parameters
    ----------
    df : dataframe
    row_count : length of data
    signal : signal/s
    returns : log returns

    Returns
    -------
    table - hopefully

    """
    df = df.copy()
    
    bins = [-48.13,-38.70, -29.28, -19.85, -10.42, -1.01,
            8.42, 17.85, 27.27, 36.7]
    
    win_above = 0
    lose_above = 0
    lose_below = 0
    win_below = 0
    
    # df = df.sort_values([signal, returns])
     
    for bin in bins:
        k = bin
        for row, value in df.iterrows():
            if row[signal] < k:
                lose_below += row[returns]
            else:
                win_below -= row[returns]
        for row, value in df.iterrows():
            if row[signal] >= k:
                win_above += row[returns]
            else: 
                lose_above -= row[returns]
                
    print(win_above, lose_above, lose_below, win_below)
            
roc_table(df = df_train, row_count = df_train.shape[0],
          signal = 'predicted_RSI_indicator',
          returns = 'log_return')   

But I only get但我只得到

Traceback (most recent call last):

  File "<ipython-input-135-cd5513bb0778>", line 50, in <module>
    roc_table(df = df_train, row_count = df_train.shape[0],

  File "<ipython-input-135-cd5513bb0778>", line 32, in roc_table
    if row[signal] < k:

TypeError: 'Timestamp' object is not subscriptable

The index is a date time stamp.索引是日期时间戳。

Here is a sample of the input df这是输入df的示例

signal   returns
-.23      .045
2.3      -.09
9.8       1.2

The output would look something like this output 看起来像这样

bins      win_above   lose_above   win_below   lose_below
-48.13    123
-38.70    -98
-29.28    100
-19.85    -34 
-10.42     567
...

So the idea is if df[singal] is below the bin, that associated return, if greater than 0, is added to win_below, else it's added to lose_below.所以想法是,如果df[singal]低于 bin,则相关的返回值(如果大于 0)被添加到 win_below,否则它被添加到 loss_below。

I'll eventually add a loop for those signals greater than the bin and add those to win_above, lose_above.我最终会为那些大于 bin 的信号添加一个循环,并将它们添加到 win_above、loose_above。

As per Pandas documentation, pandas.DataFrame.iterrows yields "the index of the row and the data of the row as a Series" .根据 Pandas 文档, pandas.DataFrame.iterrows产生“行的索引和行的数据作为系列”

So, you should be doing (twice in you for loop):所以,你应该这样做(在你的 for 循环中两次):

for i, row in df.iterrows():
    ...

instead of:代替:

for row, value in df.iterrows():
    ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM