简体   繁体   English

如何遍历数据框中的列并同时更新两个新列?

[英]How to iterate through a column in dataframe and update two new columns simultaneously?

I understand I can add a column to a dataframe and update its values to the values returned from a function, like this: 我知道我可以向数据帧添加一列,并将其值更新为函数返回的值,如下所示:

df=pd.DataFrame({'x':[1,2,3,4]})

def square(x):
    return x*x

df['x_squared'] = [square(i) for i in df['x']]

However, I am facing a problem that the actual function is returning two items, and I want to put these two items in two different new columns. 但是,我遇到的问题是实际函数返回两个项目,我想将这两个项目放在两个不同的新列中。 I wrote a pseudo-code here to describe my problem more clearly: 我在这里写了一个伪代码来更清楚地描述我的问题:

df=pd.DataFrame({'x':[1,2,3,4]})

def squareAndCube(x):
    return x*x, x*x*x

#below is a pseudo-code
df['x_squared'], df['x_cubed'] = [squareAndCube(i) for i in df['x']]

Above codes give me an error message saying "too many values to unpack". 上面的代码给我一个错误消息,说“打开太多的值”。 So, how should I fix this? 那么,我该如何解决这个问题呢?

You could do in a vectorized fashion, like so - 你可以用矢量化的方式做,就像这样 -

df['x_squared'], df['x_cubed'] = df.x**2,df.x**3

Or with that custom function, like so - 或者使用那个自定义函数,就像这样 -

df['x_squared'], df['x_cubed'] = squareAndCube(df.x)

Back to your loopy case, on the right side of the assignment, you had : 回到你的循环案例,在作业的右侧,你有:

In [101]: [squareAndCube(i) for i in df['x']]
Out[101]: [(1, 1), (4, 8), (9, 27), (16, 64)]

Now, on the left side, you had df['x_squared'], df['x_cubed'] = . 现在,在左侧,你有df['x_squared'], df['x_cubed'] = So, it's expecting the squared numbers of all the rows as the first input assignment. 因此,它期望所有行的平方数作为第一个输入赋值。 From the list shown above, the first element isn't that, it's actually the square and cube of the first row. 从上面显示的列表中,第一个元素不是,它实际上是第一行的正方形和立方体。 So, the fix is to "transpose" that list and assign as the new columns. 因此,修复是“转置”该列表并指定为新列。 Thus, the fix would be - 因此,修复将是 -

In [102]: L = [squareAndCube(i) for i in df['x']]

In [103]: map(list, zip(*L))  # Transposed list
Out[103]: [[1, 4, 9, 16], [1, 8, 27, 64]]

In [104]: df['x_squared'], df['x_cubed'] = map(list, zip(*L))

For the love of NumPy broadcasting ! 对于NumPy broadcasting的热爱!

df['x_squared'], df['x_cubed'] = (df.x.values[:,None]**[2,3]).T

This works for positive numbers. 这适用于正数。 Thinking how to generalize but the brevity of this solution has me distracted. 思考如何概括,但这种解决方案的简洁让我心烦意乱。

df = pd.DataFrame(range(1, 10))
a = np.arange(1, 4).reshape(1, -1)

np.exp(np.log(df).dot(a))

在此输入图像描述

How about using df.loc like this: 如何像这样使用df.loc

df=pd.DataFrame({'x':[1,2,3,4]})

def square(x):
    return x*x

df['x_squared'] = df['x_cubed'] = None
df.loc[:, ['x_squared', 'x_cubed']] = [squareAndCube(i) for i in df['x']]

gives

   x  x_squared  x_cubed
0  1          1        1
1  2          4        8
2  3          9       27
3  4         16       64

This is very close to what you had, but the columns need to exist for df.loc to work. 非常接近你所拥有的,但df.loc需要存在列才能工作。

For the uninitiated, df.loc takes two parameters, a list of rows you want to work on - in this case : which means all of them, and a list of columns - ['x_squared', 'x_cubed'] . 对于不熟悉的人来说,df.loc有两个参数,一个你想要处理的行列表 - 在这种情况下:这意味着所有这些参数,以及一个列列表 - ['x_squared', 'x_cubed']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM