Pandas，根据前一行对每一行应用函数

Question

我有一个像这样的数据帧：

          date  compound_score  negativity_score  positive_score  \
0   2017-12-10        0.361400          0.339500        0.311000   
1   2017-12-11        0.639950          0.216000        0.476000   
2   2017-12-12        0.554286          0.262000        0.464000   
3   2017-12-13        0.715275          0.232250        0.423750   
4   2017-12-14        0.760940          0.221600        0.476200   
5   2017-12-15        0.503886          0.241429        0.391000   
6   2017-12-16        0.372300          0.345333        0.356667   
7   2017-12-17        0.700900          0.163000        0.458000   
8   2017-12-18        0.369733          0.220667        0.364222   
9   2017-12-19        0.176000          0.304000        0.362000   
10  2017-12-20        0.474322          0.262222        0.426778   
11  2017-12-21        0.623620          0.224000        0.435200   
12  2017-12-22        0.488125          0.211375        0.438000   
13  2017-12-23        0.226900          0.121500        0.341500   
14  2017-12-24        0.461800          0.233000        0.545000   
15  2017-12-25        0.686040          0.270800        0.458600   
16  2017-12-26        0.760525          0.212750        0.527250   
17  2017-12-27        0.627575          0.122250        0.463500   
18  2017-12-28        0.579173          0.210182        0.381909   
19  2017-12-29        0.378815          0.239000        0.339846   
20  2017-12-30        0.428200          0.328000        0.349000   
21  2017-12-31       -0.116800          0.507000        0.295000   
22  2018-01-01        0.515433          0.315000        0.417000   
23  2018-01-02        0.380250          0.298250        0.366250   
24  2018-01-03        0.609657          0.277000        0.458714   
25  2018-01-04        0.751067          0.251667        0.465000   
26  2018-01-05        0.207000          0.255750        0.324500   
27  2018-01-06        0.853200          0.127000        0.253000   
28  2018-01-07        0.506800          0.284500        0.350500   
29  2018-01-08        0.499586          0.262571        0.388571   
    neutral_score  compound_diff  consecutive_compound  
0        0.349500            NaN                     0  
1        0.308000       0.278550                     1  
2        0.274143      -0.085664                     0  
3        0.344000       0.160989                     1  
4        0.302200       0.045665                     1  
5        0.367429      -0.257054                     0  
6        0.298000      -0.131586                     0  
7        0.379000       0.328600                     1  
8        0.415111      -0.331167                     0  
9        0.333800      -0.193733                     0  
10       0.311000       0.298322                     1  
11       0.340800       0.149298                     1  
12       0.350375      -0.135495                     0  
13       0.537500      -0.261225                     0  
14       0.222000       0.234900                     1  
15       0.270800       0.224240                     1  
16       0.260000       0.074485                     1  
17       0.414000      -0.132950                     0  
18       0.407909      -0.048402                     0  
19       0.420923      -0.200357                     0  
20       0.323000       0.049385                     1  
21       0.197000      -0.545000                     0  
22       0.268000       0.632233                     1  
23       0.335250      -0.135183                     0  
24       0.264429       0.229407                     1  
....

我想对依赖于每行前 14 行的数据框应用计算函数。

我试图从行本身传递一个移位的数据框，但我不太明白如何将函数传递到当前行并在函数中移回 14 天。

我尝试了以下操作，都返回了 Nan 或引发了错误：

    def get_up_down_pct_ratio(df):
        up_days_pct = df.loc[df[COMPOUND_DIFF] > 0, COMPOUND_DIFF].sum()
        fall_days_pct = df.loc[df[COMPOUND_DIFF] < 0, COMPOUND_DIFF].sum()
        total = up_days_pct + fall_days_pct
        return percent(up_days_pct, total)
d['up_down_ratio'] = d.apply(lambda x: get_up_down_pct_ratio(x.shift(14)),axis=1)

这只是将 Nan 分配给该列

def get_up_down_pct_ratio(row):
    up_days_pct = row[row['compound_diff'] > 0, 'compound_diff'].sum()
    fall_days_pct = row[row['compound_diff'] > 0, 'compound_diff'].sum()
    total = up_days_pct + fall_days_pct
    return percent(up_days_pct, total)
a['up_down_pct_ration'] = a.apply(lambda row: get_up_down_pct_ratio(row))

引发的错误：

ValueError: key of type tuple not found and not a MultiIndex

Answer 1

有几点需要注意。

apply() 需要axis=1
需要处理 NaN 情况

下面是不同的方法。 IE。 创造一流的积累14天循环和处理的所有情况： UP天与FALL天。

class accumulate(object):
    def __init__(self):
        self.accumList = [0 for n in range(14)]
    def newDate(self, v, up=True):
        self.accumList[0:13] = self.accumList[1:]
        v = float(v)
        if (v+0.0) != v:
            # remove NaN 
            v = 0.0
        elif up and (v < 0) :
            # Value > 0
            v = 0.0
        elif (not up) and (v > 0) :
            # track Value < 0
            v = 0.0
        self.accumList[13] = v

        return sum(self.accumList)
a = accumulate()
df['up'] = df.apply(lambda r: a.newDate(r.compound_diff), axis=1)
a = accumulate() # restart rolling amounts
df['fall'] = df.apply(lambda r: a.newDate(r.compound_diff, up=False), axis=1)
df['pct'] = df.up / (df.up + df.fall)
df.head()

Answer 2

@frankr6591 的回答并没有完全让我找到我需要的地方，它确实让我朝着正确的方向前进。

我需要以多种方式在这个数据框上应用这个逻辑，所以我创建了一个更简单、更通用的函数：它确实需要更多的优化，但现在，它可以很好地处理传递给它的不同列

def calculate_two_weeks_data(new_col_name, col_to_run_on):
    def calculate_ratio_value(row, df_, col):
        index = row['index']
        start_idx = index - 14
        if start_idx < 0:
            return None
        else:
            prev_rows = df_.iloc[start_idx:index]
            col_to_list = prev_rows[col].tolist()
            up_values = 0
            down_values = 0
            for value in col_to_list:
                if value > 0:
                    up_values += value
                else:
                    down_values += value
            up_ratio = up_values / (up_values + down_values)
            return up_ratio

    df.reset_index(inplace=True)
    df[new_col_name] = df.apply(calculate_ratio_value, args=[df, col_to_run_on], axis=1)
    df.dropna(inplace=True)
    return df

Pandas，根据前一行对每一行应用函数

问题描述

2 个解决方案

解决方案1
1 2020-11-24 15:20:19

解决方案2
0 2020-11-24 17:31:06

Pandas，根据前一行对每一行应用函数

问题描述

2 个解决方案

解决方案1 1 2020-11-24 15:20:19

解决方案2 0 2020-11-24 17:31:06

解决方案1
1 2020-11-24 15:20:19

解决方案2
0 2020-11-24 17:31:06