[英]Pandas: create new column based on above row in the newly created column
I have a two-column numerical dataframe , and I'm trying to add a 3rd column.我有一个两列的数字数据框,我正在尝试添加第三列。
Row col1 col2
0 8 8
1 8 4
2 6 2
3 3 7
4 6 4
5 2 6
Where in the first row, col3 = max(col1 - col2,0)
and on the rest of the rows, col3 = max(col1 - col2 + col3_of_the_row_above, 0)
在第一行,
col3 = max(col1 - col2,0)
和其余的行, col3 = max(col1 - col2 + col3_of_the_row_above, 0)
The resulting dataframe should look like this:生成的数据框应如下所示:
Row col1 col2 col3
0 8 8 0
1 8 4 4
2 6 2 8
3 3 7 4
4 6 4 6
5 2 6 2
Is there an efficient way to do this?有没有一种有效的方法来做到这一点?
To create a new column you can just do this:要创建一个新列,您可以这样做:
df['col3'] = 0 # all the rows will be filled with zeros
col3 will be added in you dataframe. col3 将添加到您的数据框中。
Because the calculation method of your first row is different of the others, you'll need to this manually.由于您第一行的计算方法与其他行不同,因此您需要手动进行此操作。
df['col3'][0] = max(df['col1'][0] - df['col2'][0], 0)
The calculation method of the other rows is the same, so you can do this with a for iteration.其他行的计算方法相同,因此您可以使用 for 迭代来执行此操作。
for row in range(1, len(df)):
df['col3'][row] = max(df['col1'][row] - df['col2'][row] + df['col3'][row - 1], 0)
PS: You can do this using list comprehension too, maybe it's too early, but I'll put the code too so you can study the code. PS:你也可以用list comprehension来做到这一点,也许现在还为时过早,但我也会把代码放出来,这样你就可以研究代码了。
df['col3'] = 0 # all the rows will be filled with zeros
df['col3'] = [max(df['col1'][row] - df['col2'][row] + df['col3'][row - 1], 0) if row > 0 else max(df['col1'][row] - df['col2'][row], 0) for row in range(len(df))]
This is a more pythonic way to this, but it can be a little confusing at first sight.这是一种更加 Pythonic 的方式,但乍一看可能有点令人困惑。
Try this:尝试这个:
# Calculate value for first row clip lower value to zero
s = (df.iloc[0, df.columns.get_loc('col1')] - df.iloc[0, df.columns.get_loc('col2')]).clip(0,)
# Calculate difference for each row after first
df['col3'] = (df.iloc[1:, df.columns.get_loc('col1')] - df.iloc[1:, df.columns.get_loc('col2')])
# Fill 'col3' with first value then cumsum differences
df['col3'] = df['col3'].fillna(s).cumsum()
df
Output:输出:
col1 col2 col3
Row
0 8 8 0.0
1 8 4 4.0
2 6 2 8.0
3 3 7 4.0
4 6 4 6.0
5 2 6 2.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.