简体   繁体   中英

How to iterate through a column in dataframe and update two new columns simultaneously?

I understand I can add a column to a dataframe and update its values to the values returned from a function, like this:

df=pd.DataFrame({'x':[1,2,3,4]})

def square(x):
    return x*x

df['x_squared'] = [square(i) for i in df['x']]

However, I am facing a problem that the actual function is returning two items, and I want to put these two items in two different new columns. I wrote a pseudo-code here to describe my problem more clearly:

df=pd.DataFrame({'x':[1,2,3,4]})

def squareAndCube(x):
    return x*x, x*x*x

#below is a pseudo-code
df['x_squared'], df['x_cubed'] = [squareAndCube(i) for i in df['x']]

Above codes give me an error message saying "too many values to unpack". So, how should I fix this?

You could do in a vectorized fashion, like so -

df['x_squared'], df['x_cubed'] = df.x**2,df.x**3

Or with that custom function, like so -

df['x_squared'], df['x_cubed'] = squareAndCube(df.x)

Back to your loopy case, on the right side of the assignment, you had :

In [101]: [squareAndCube(i) for i in df['x']]
Out[101]: [(1, 1), (4, 8), (9, 27), (16, 64)]

Now, on the left side, you had df['x_squared'], df['x_cubed'] = . So, it's expecting the squared numbers of all the rows as the first input assignment. From the list shown above, the first element isn't that, it's actually the square and cube of the first row. So, the fix is to "transpose" that list and assign as the new columns. Thus, the fix would be -

In [102]: L = [squareAndCube(i) for i in df['x']]

In [103]: map(list, zip(*L))  # Transposed list
Out[103]: [[1, 4, 9, 16], [1, 8, 27, 64]]

In [104]: df['x_squared'], df['x_cubed'] = map(list, zip(*L))

For the love of NumPy broadcasting !

df['x_squared'], df['x_cubed'] = (df.x.values[:,None]**[2,3]).T

This works for positive numbers. Thinking how to generalize but the brevity of this solution has me distracted.

df = pd.DataFrame(range(1, 10))
a = np.arange(1, 4).reshape(1, -1)

np.exp(np.log(df).dot(a))

在此输入图像描述

How about using df.loc like this:

df=pd.DataFrame({'x':[1,2,3,4]})

def square(x):
    return x*x

df['x_squared'] = df['x_cubed'] = None
df.loc[:, ['x_squared', 'x_cubed']] = [squareAndCube(i) for i in df['x']]

gives

   x  x_squared  x_cubed
0  1          1        1
1  2          4        8
2  3          9       27
3  4         16       64

This is very close to what you had, but the columns need to exist for df.loc to work.

For the uninitiated, df.loc takes two parameters, a list of rows you want to work on - in this case : which means all of them, and a list of columns - ['x_squared', 'x_cubed'] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM