简体   繁体   中英

How to create new column and insert row values while iterating through pandas data frame

I am trying to create a function that iterates through a pandas dataframe row by row. I want to create a new column based on row values of other columns. My original dataframe could look like this:

df:

   A   B
0  1   2
1  3   4
2  2   2

Now I want to create a new column filled with the row values of Column A - Column B at each index position, so that the result looks like this:

 df:

       A   B   A-B
    0  1   2   -1
    1  3   4   -1
    2  2   2    0

the solution I have works, but only when I do NOT use it in a function:

for index, row in df.iterrows():
        print index
        df['A-B']=df['A']-df['B']

This gives me the desired output, but when I try to use it as a function, I get an error.

def test(x):
    for index, row in df.iterrows():
        print index
        df['A-B']=df['A']-df['B']
    return df
df.apply(test)

ValueError: cannot copy sequence with size 4 to array axis with dimension 3

What am I doing wrong here and how can I get it to work?

It's because apply method works for column by default, change axis to 1 if you'd like through rows:

axis : {0 or 'index', 1 or 'columns'}, default 0

  • 0 or 'index': apply function to each column
  • 1 or 'columns': apply function to each row
df.apply(test, axis=1)

EDIT

I thought that you need to do something complex manupulation with each row. If you need just substract columns from each other:

df['A-B'] = df.A - df.B

Like indicated by Anton you should execute the apply function with axis=1 parameter. However it is not necessary to then loop through the rows as you did in the function test, since the apply documentation mentions:

Objects passed to functions are Series objects

So you could simplify the function to:

def test(x):
    x['A-B']=x['A']-x['B']
    return x

and then run:

df.apply(test,axis=1)

Note that in fact you named the parameter of test x , while not using x in the function test at all.

Finally I should comment that you can do column wise operations with pandas (ie without for loop) doing simply this:

df['A-B']=df['A']-df['B']

Also see:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM