Pandas 列取决于其先前的值（行）？

Question

I would like to create a 3rd column in my dataframe, which depends on both the new and existing columns in the previous row.我想在我的 dataframe 中创建第三列，这取决于前一行中的新列和现有列。

This new column should start at 0.这个新列应该从 0 开始。

I would like my 3rd column to start at 0.我希望我的第 3 列从 0 开始。

Its next value is its previous value plus df.below_lo[i] (if the previous value was 0).它的下一个值是它的前一个值加上df.below_lo[i] （如果前一个值为 0）。

If its previous value was 1, its next value is its previous value plus df.above_hi[i] .如果它的前一个值是 1，它的下一个值是它的前一个值加上df.above_hi[i] 。

I think I have two issues: how to initiate this 3rd column and how to make it dependent on itself.我想我有两个问题：如何启动第 3 列以及如何使其依赖于自身。

import pandas as pd
import math

data = {'below_lo': [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
        'above_hi': [0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0]}

df = pd.DataFrame(data)

df['pos'] = math.nan
df['pos'][0] = 0

for i in range(len(df.below_lo)):
    if df.pos[i] == 0:
        df.pos[i+1] = df.pos[i] + df.below_lo[i]
    if df.pos[i] == 1:
        df.pos[i+1] = df.pos[i] + df.above_hi[i]

print(df)

The desired output is:所需的 output 是：

    below_lo  above_hi  pos
0        0.0       0.0  0.0
1        1.0       0.0  0.0
2        0.0      -1.0  1.0
3        0.0       0.0  0.0
4        0.0      -1.0  0.0
5        0.0       0.0  0.0
6        0.0      -1.0  0.0
7        0.0       0.0  0.0
8        0.0       0.0  0.0
9        1.0       0.0  0.0
10       0.0       0.0  1.0
11       0.0       0.0  1.0
12       0.0       0.0  1.0
13       NaN       NaN  1.0

The above code produces the correct output, except I am also getting a few of these error messages:上面的代码产生了正确的 output，除了我还收到一些错误消息：

A value is trying to be set on a copy of a slice from a DataFrame试图在 DataFrame 的切片副本上设置值

How do I clean this code up so that it runs without throwing this warning?如何清理此代码以使其运行而不抛出此警告？ ? ?

Answer 1

Use .loc :使用.loc ：

df.loc[0, 'pos'] = 0

for i in range(len(df.below_lo)):
    if df.loc[i, 'pos'] == 0:
        df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'below_lo']
    if df.loc[i, 'pos'] == 1:
        df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'above_hi']

Answer 2

Appreciate there is an accepted, and perfectly good, answer by @Michael O. already, but if you dislike iterating over rows as not-quite Pandas-esque, here is a solution without explicit looping over rows:感谢@Michael O.已经接受了一个很好的回答，但是如果你不喜欢像不完全Pandas-esque那样对行进行迭代，这里有一个没有显式循环遍历行的解决方案：

from functools import reduce
res = reduce(lambda d, _ : 
    d.fillna({'pos':d['pos'].shift(1) 
            + (d['pos'].shift(1)  == 0) * d['below_lo'] 
            + (d['pos'].shift(1)  == 1) * d['above_hi']}), 
        range(len(df)), df)
res

produces生产

      below_lo    above_hi    pos
--  ----------  ----------  -----
 0           0           0      0
 1           1           0      1
 2           0          -1      0
 3           0           0      0
 4           0          -1      0
 5           0           0      0
 6           0          -1      0
 7           0           0      0
 8           0           0      0
 9           1           0      1
10           0           0      1
11           0           0      1
12           0           0      1

It is, admittedly, somewhat less efficient and has a bit more obscure syntax.诚然，它的效率有点低，语法也有点模糊。 But it could be written on a single line (even if I split it over a few for clarity)!但它可以写在一行上（即使为了清楚起见我把它分成了几行）！

The idea is that we can use fillna(..) function by passing the value, calculated from the previous value of 'pos' (hence shift(1) ) and current values of 'below_lo' and 'above_hi'.这个想法是，我们可以通过传递值来使用fillna(..) function，该值是根据“pos”的先前值（因此shift(1) ）和“below_lo”和“above_hi”的当前值计算得出的。 The extra complication here is that this operation will only fill NaN with a non-NaN for the row just below the one with non-NaN value.这里的额外复杂之处在于，此操作只会为具有非 NaN 值的行的正下方的行填充非NaN 。 Hence we need to apply this function repeatedly until all NaNs are filled, and this is where reduce comes into play因此我们需要重复应用这个 function 直到所有的 NaN 都被填满，这就是reduce发挥作用的地方

Pandas 列取决于其先前的值（行）？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-12-01 17:21:36

解决方案2
0 2020-12-01 20:11:42

Pandas 列取决于其先前的值（行）？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-12-01 17:21:36

解决方案2 0 2020-12-01 20:11:42

解决方案1
2 已采纳 2020-12-01 17:21:36

解决方案2
0 2020-12-01 20:11:42