简体   繁体   中英

Pandas: create new column based on above row in the newly created column

I have a two-column numerical dataframe , and I'm trying to add a 3rd column.

    Row    col1    col2
 
    0      8       8      
    1      8       4   
    2      6       2   
    3      3       7   
    4      6       4   
    5      2       6  

Where in the first row, col3 = max(col1 - col2,0) and on the rest of the rows, col3 = max(col1 - col2 + col3_of_the_row_above, 0)

The resulting dataframe should look like this:

    Row    col1    col2    col3
 
    0      8       8       0   
    1      8       4       4
    2      6       2       8
    3      3       7       4
    4      6       4       6
    5      2       6       2

Is there an efficient way to do this?

To create a new column you can just do this:

 df['col3'] = 0 # all the rows will be filled with zeros

col3 will be added in you dataframe.

Because the calculation method of your first row is different of the others, you'll need to this manually.

df['col3'][0] = max(df['col1'][0] - df['col2'][0], 0)

The calculation method of the other rows is the same, so you can do this with a for iteration.

 for row in range(1, len(df)):
        df['col3'][row] = max(df['col1'][row] - df['col2'][row] + df['col3'][row - 1], 0)

PS: You can do this using list comprehension too, maybe it's too early, but I'll put the code too so you can study the code.

df['col3'] = 0 # all the rows will be filled with zeros
df['col3'] = [max(df['col1'][row] - df['col2'][row] + df['col3'][row - 1], 0) if row > 0 else max(df['col1'][row] - df['col2'][row], 0) for row in range(len(df))]

This is a more pythonic way to this, but it can be a little confusing at first sight.

Try this:

# Calculate value for first row clip lower value to zero
s = (df.iloc[0, df.columns.get_loc('col1')] - df.iloc[0, df.columns.get_loc('col2')]).clip(0,)

# Calculate difference for each row after first
df['col3'] = (df.iloc[1:, df.columns.get_loc('col1')] - df.iloc[1:, df.columns.get_loc('col2')])

# Fill 'col3' with first value then cumsum differences
df['col3'] = df['col3'].fillna(s).cumsum()

df

Output:

     col1  col2  col3
Row                  
0       8     8   0.0
1       8     4   4.0
2       6     2   8.0
3       3     7   4.0
4       6     4   6.0
5       2     6   2.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM