如何遍历 pandas dataframe 中的两列以将值添加到列表中

Question

i'm trying to evaluate a condition in one pandas column, and depending on the condition, take the value from another pandas column and append it to a list.我正在尝试评估一个 pandas 列中的条件，并根据条件从另一个 pandas 列和 append 中获取值到列表中。

I tried the following:我尝试了以下方法：

    def roc_table(df, row_count, signal, returns):
    """
    

    Parameters
    ----------
    df : dataframe
    row_count : length of data
    signal : signal/s
    returns : log returns

    Returns
    -------
    table - hopefully

    """
    df = df.copy()
    
    bins = [-48.13,-38.70, -29.28, -19.85, -10.42, -1.01,
            8.42, 17.85, 27.27, 36.7]
    
    win_above = 0
    lose_above = 0
    lose_below = 0
    win_below = 0
    
    # df = df.sort_values([signal, returns])
     
    for bin in bins:
        k = bin
        for row, value in df.iterrows():
            if row[signal] < k:
                lose_below += row[returns]
            else:
                win_below -= row[returns]
        for row, value in df.iterrows():
            if row[signal] >= k:
                win_above += row[returns]
            else: 
                lose_above -= row[returns]
                
    print(win_above, lose_above, lose_below, win_below)
            
roc_table(df = df_train, row_count = df_train.shape[0],
          signal = 'predicted_RSI_indicator',
          returns = 'log_return')

But I only get但我只得到

Traceback (most recent call last):

  File "<ipython-input-135-cd5513bb0778>", line 50, in <module>
    roc_table(df = df_train, row_count = df_train.shape[0],

  File "<ipython-input-135-cd5513bb0778>", line 32, in roc_table
    if row[signal] < k:

TypeError: 'Timestamp' object is not subscriptable

The index is a date time stamp.索引是日期时间戳。

Here is a sample of the input df这是输入df的示例

signal   returns
-.23      .045
2.3      -.09
9.8       1.2

The output would look something like this output 看起来像这样

bins      win_above   lose_above   win_below   lose_below
-48.13    123
-38.70    -98
-29.28    100
-19.85    -34 
-10.42     567
...

So the idea is if df[singal] is below the bin, that associated return, if greater than 0, is added to win_below, else it's added to lose_below.所以想法是，如果df[singal]低于 bin，则相关的返回值（如果大于 0）被添加到 win_below，否则它被添加到 loss_below。

I'll eventually add a loop for those signals greater than the bin and add those to win_above, lose_above.我最终会为那些大于 bin 的信号添加一个循环，并将它们添加到 win_above、loose_above。

Answer 1

As per Pandas documentation, pandas.DataFrame.iterrows yields "the index of the row and the data of the row as a Series" .根据 Pandas 文档， pandas.DataFrame.iterrows产生“行的索引和行的数据作为系列” 。

So, you should be doing (twice in you for loop):所以，你应该这样做（在你的 for 循环中两次）：

for i, row in df.iterrows():
    ...

instead of:代替：

for row, value in df.iterrows():
    ...

如何遍历 pandas dataframe 中的两列以将值添加到列表中

问题描述

1 个解决方案

解决方案1
0 2021-05-12 17:21:54

如何遍历 pandas dataframe 中的两列以将值添加到列表中

问题描述

1 个解决方案

解决方案1 0 2021-05-12 17:21:54

解决方案1
0 2021-05-12 17:21:54