[英]Pandas column that depends on its previous value (row)?
I would like to create a 3rd column in my dataframe, which depends on both the new and existing columns in the previous row.我想在我的 dataframe 中创建第三列,这取决于前一行中的新列和现有列。
This new column should start at 0.这个新列应该从 0 开始。
I would like my 3rd column to start at 0.我希望我的第 3 列从 0 开始。
Its next value is its previous value plus df.below_lo[i]
(if the previous value was 0).它的下一个值是它的前一个值加上
df.below_lo[i]
(如果前一个值为 0)。
If its previous value was 1, its next value is its previous value plus df.above_hi[i]
.如果它的前一个值是 1,它的下一个值是它的前一个值加上
df.above_hi[i]
。
I think I have two issues: how to initiate this 3rd column and how to make it dependent on itself.我想我有两个问题:如何启动第 3 列以及如何使其依赖于自身。
import pandas as pd
import math
data = {'below_lo': [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
'above_hi': [0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0]}
df = pd.DataFrame(data)
df['pos'] = math.nan
df['pos'][0] = 0
for i in range(len(df.below_lo)):
if df.pos[i] == 0:
df.pos[i+1] = df.pos[i] + df.below_lo[i]
if df.pos[i] == 1:
df.pos[i+1] = df.pos[i] + df.above_hi[i]
print(df)
The desired output is:所需的 output 是:
below_lo above_hi pos
0 0.0 0.0 0.0
1 1.0 0.0 0.0
2 0.0 -1.0 1.0
3 0.0 0.0 0.0
4 0.0 -1.0 0.0
5 0.0 0.0 0.0
6 0.0 -1.0 0.0
7 0.0 0.0 0.0
8 0.0 0.0 0.0
9 1.0 0.0 0.0
10 0.0 0.0 1.0
11 0.0 0.0 1.0
12 0.0 0.0 1.0
13 NaN NaN 1.0
The above code produces the correct output, except I am also getting a few of these error messages:上面的代码产生了正确的 output,除了我还收到一些错误消息:
A value is trying to be set on a copy of a slice from a DataFrame
试图在 DataFrame 的切片副本上设置值
How do I clean this code up so that it runs without throwing this warning?如何清理此代码以使其运行而不抛出此警告? ?
?
Use .loc
:使用
.loc
:
df.loc[0, 'pos'] = 0
for i in range(len(df.below_lo)):
if df.loc[i, 'pos'] == 0:
df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'below_lo']
if df.loc[i, 'pos'] == 1:
df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'above_hi']
Appreciate there is an accepted, and perfectly good, answer by @Michael O. already, but if you dislike iterating over rows as not-quite Pandas-esque, here is a solution without explicit looping over rows:感谢@Michael O.已经接受了一个很好的回答,但是如果你不喜欢像不完全Pandas-esque那样对行进行迭代,这里有一个没有显式循环遍历行的解决方案:
from functools import reduce
res = reduce(lambda d, _ :
d.fillna({'pos':d['pos'].shift(1)
+ (d['pos'].shift(1) == 0) * d['below_lo']
+ (d['pos'].shift(1) == 1) * d['above_hi']}),
range(len(df)), df)
res
produces生产
below_lo above_hi pos
-- ---------- ---------- -----
0 0 0 0
1 1 0 1
2 0 -1 0
3 0 0 0
4 0 -1 0
5 0 0 0
6 0 -1 0
7 0 0 0
8 0 0 0
9 1 0 1
10 0 0 1
11 0 0 1
12 0 0 1
It is, admittedly, somewhat less efficient and has a bit more obscure syntax.诚然,它的效率有点低,语法也有点模糊。 But it could be written on a single line (even if I split it over a few for clarity)!
但它可以写在一行上(即使为了清楚起见我把它分成了几行)!
The idea is that we can use fillna(..)
function by passing the value, calculated from the previous value of 'pos' (hence shift(1)
) and current values of 'below_lo' and 'above_hi'.这个想法是,我们可以通过传递值来使用
fillna(..)
function,该值是根据“pos”的先前值(因此shift(1)
)和“below_lo”和“above_hi”的当前值计算得出的。 The extra complication here is that this operation will only fill NaN
with a non-NaN for the row just below the one with non-NaN value.这里的额外复杂之处在于,此操作只会为具有非 NaN 值的行的正下方的行填充非
NaN
。 Hence we need to apply this function repeatedly until all NaNs are filled, and this is where reduce
comes into play因此我们需要重复应用这个 function 直到所有的 NaN 都被填满,这就是
reduce
发挥作用的地方
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.