简体   繁体   English

遍历 Pandas dataframe 的列并创建新变量

[英]Iterate over columns of Pandas dataframe and create new variables

I am having trouble figuring out how to iterate over variables in a pandas dataframe and perform same arithmetic function on each.我无法弄清楚如何迭代 pandas dataframe 中的变量并对每个变量执行相同的算术 function。

I have a dataframe df that contain three numeric variables x1 , x2 and x3 .我有一个 dataframe df ,其中包含三个数值变量x1x2x3 I want to create three new variables by multiplying each by 2. Here's what I am doing:我想通过将每个变量乘以 2 来创建三个新变量。这就是我正在做的事情:

existing = ['x1','x2','x3']
new = ['y1','y2','y3']

for i in existing:
    for j in new:
        df[j] = df[i]*2

Above code is in fact creating three new variables y1 , y2 and y3 in the dataframe.上面的代码实际上是在 dataframe 中创建了三个新变量y1y2y3 But the values of y1 and y2 are being overridden by the values of y3 and all three variables have same values, corresponding to that of y3 .但是y1y2的值被y3的值覆盖,并且所有三个变量都具有相同的值,对应于y3的值。 I am not sure what I am missing.我不确定我错过了什么。

Really appreciate any guidance/ suggestion.非常感谢任何指导/建议。 Thanks.谢谢。

You are looping something like 9 times here - 3 times for each column, with each iteration overwriting the previous.您在这里循环了 9 次 - 每列 3 次,每次迭代都会覆盖前一个。

You may want something like你可能想要类似的东西

for e, n in zip(existing,new):
    df[n] = df[e]*2

I would do something more generic我会做一些更通用的事情

#existing = ['x1','x2','x3']
exisiting = df.columns
new = existing.replace('x','y') 
#maybe you need map+lambda/for for each existing string

for (ind_existing, ind_new) in zip(existing,new):
    df[new[ind_new]] = df[existing[ind_existing]]*2 
#maybe there is more elegant way by using pandas assign function

You can concatenante the original DataFrame with the columns with doubled values:您可以将原始 DataFrame 与具有双倍值的列连接起来:

cols_to_double = ['x0', 'x1', 'x2']
new_cols = list(df.columns) + [c.replace('x', 'y') for c in cols_to_double]

df = pd.concat([df, 2 * df[cols_to_double]], axis=1, copy=True)
df.columns = new_cols

So, if your input df Dataframe is:因此,如果您的输入df Dataframe 是:

   x0  x1  x2  other0  other1
0   0   1   2       3       4
1   0   1   2       3       4
2   0   1   2       3       4
3   0   1   2       3       4
4   0   1   2       3       4

after executing the previous lines, you get:执行前几行后,您将获得:

   x0  x1  x2  other0  other1  y0  y1  y2
0   0   1   2       3       4   0   2   4
1   0   1   2       3       4   0   2   4
2   0   1   2       3       4   0   2   4
3   0   1   2       3       4   0   2   4
4   0   1   2       3       4   0   2   4

Here the code to create df :这里是创建df的代码:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    data=np.column_stack([np.full((5,), i) for i in range(5)]),
    columns=[f'x{i}' for i in range(3)] + [f'other{i}' for i in range(2)]
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM