How to iterate through a column in dataframe and update two new columns simultaneously?

Question

I understand I can add a column to a dataframe and update its values to the values returned from a function, like this:

df=pd.DataFrame({'x':[1,2,3,4]})

def square(x):
    return x*x

df['x_squared'] = [square(i) for i in df['x']]

However, I am facing a problem that the actual function is returning two items, and I want to put these two items in two different new columns. I wrote a pseudo-code here to describe my problem more clearly:

df=pd.DataFrame({'x':[1,2,3,4]})

def squareAndCube(x):
    return x*x, x*x*x

#below is a pseudo-code
df['x_squared'], df['x_cubed'] = [squareAndCube(i) for i in df['x']]

Above codes give me an error message saying "too many values to unpack". So, how should I fix this?

Answer 1

You could do in a vectorized fashion, like so -

df['x_squared'], df['x_cubed'] = df.x**2,df.x**3

Or with that custom function, like so -

df['x_squared'], df['x_cubed'] = squareAndCube(df.x)

Back to your loopy case, on the right side of the assignment, you had :

In [101]: [squareAndCube(i) for i in df['x']]
Out[101]: [(1, 1), (4, 8), (9, 27), (16, 64)]

Now, on the left side, you had df['x_squared'], df['x_cubed'] = . So, it's expecting the squared numbers of all the rows as the first input assignment. From the list shown above, the first element isn't that, it's actually the square and cube of the first row. So, the fix is to "transpose" that list and assign as the new columns. Thus, the fix would be -

In [102]: L = [squareAndCube(i) for i in df['x']]

In [103]: map(list, zip(*L))  # Transposed list
Out[103]: [[1, 4, 9, 16], [1, 8, 27, 64]]

In [104]: df['x_squared'], df['x_cubed'] = map(list, zip(*L))

For the love of NumPy broadcasting !

df['x_squared'], df['x_cubed'] = (df.x.values[:,None]**[2,3]).T

Answer 2

This works for positive numbers. Thinking how to generalize but the brevity of this solution has me distracted.

df = pd.DataFrame(range(1, 10))
a = np.arange(1, 4).reshape(1, -1)

np.exp(np.log(df).dot(a))

Answer 3

How about using df.loc like this:

df=pd.DataFrame({'x':[1,2,3,4]})

def square(x):
    return x*x

df['x_squared'] = df['x_cubed'] = None
df.loc[:, ['x_squared', 'x_cubed']] = [squareAndCube(i) for i in df['x']]

gives

   x  x_squared  x_cubed
0  1          1        1
1  2          4        8
2  3          9       27
3  4         16       64

This is very close to what you had, but the columns need to exist for df.loc to work.

For the uninitiated, df.loc takes two parameters, a list of rows you want to work on - in this case : which means all of them, and a list of columns - ['x_squared', 'x_cubed'] .

How to iterate through a column in dataframe and update two new columns simultaneously?

Question

3 answers

solution1
3 ACCPTED 2016-07-20 18:31:43

solution2
1 2016-07-20 20:23:14

solution3
0 2016-07-20 19:20:27

How to iterate through a column in dataframe and update two new columns simultaneously?

Question

3 answers

solution1 3 ACCPTED 2016-07-20 18:31:43

solution2 1 2016-07-20 20:23:14

solution3 0 2016-07-20 19:20:27

solution1
3 ACCPTED 2016-07-20 18:31:43

solution2
1 2016-07-20 20:23:14

solution3
0 2016-07-20 19:20:27