[英]Iterate over columns of Pandas dataframe and create new variables
I am having trouble figuring out how to iterate over variables in a pandas dataframe and perform same arithmetic function on each.我无法弄清楚如何迭代 pandas dataframe 中的变量并对每个变量执行相同的算术 function。
I have a dataframe df
that contain three numeric variables x1
, x2
and x3
.我有一个 dataframe
df
,其中包含三个数值变量x1
、 x2
和x3
。 I want to create three new variables by multiplying each by 2. Here's what I am doing:我想通过将每个变量乘以 2 来创建三个新变量。这就是我正在做的事情:
existing = ['x1','x2','x3']
new = ['y1','y2','y3']
for i in existing:
for j in new:
df[j] = df[i]*2
Above code is in fact creating three new variables y1
, y2
and y3
in the dataframe.上面的代码实际上是在 dataframe 中创建了三个新变量
y1
、 y2
和y3
。 But the values of y1
and y2
are being overridden by the values of y3
and all three variables have same values, corresponding to that of y3
.但是
y1
和y2
的值被y3
的值覆盖,并且所有三个变量都具有相同的值,对应于y3
的值。 I am not sure what I am missing.我不确定我错过了什么。
Really appreciate any guidance/ suggestion.非常感谢任何指导/建议。 Thanks.
谢谢。
You are looping something like 9 times here - 3 times for each column, with each iteration overwriting the previous.您在这里循环了 9 次 - 每列 3 次,每次迭代都会覆盖前一个。
You may want something like你可能想要类似的东西
for e, n in zip(existing,new):
df[n] = df[e]*2
I would do something more generic我会做一些更通用的事情
#existing = ['x1','x2','x3']
exisiting = df.columns
new = existing.replace('x','y')
#maybe you need map+lambda/for for each existing string
for (ind_existing, ind_new) in zip(existing,new):
df[new[ind_new]] = df[existing[ind_existing]]*2
#maybe there is more elegant way by using pandas assign function
You can concatenante the original DataFrame with the columns with doubled values:您可以将原始 DataFrame 与具有双倍值的列连接起来:
cols_to_double = ['x0', 'x1', 'x2']
new_cols = list(df.columns) + [c.replace('x', 'y') for c in cols_to_double]
df = pd.concat([df, 2 * df[cols_to_double]], axis=1, copy=True)
df.columns = new_cols
So, if your input df
Dataframe is:因此,如果您的输入
df
Dataframe 是:
x0 x1 x2 other0 other1
0 0 1 2 3 4
1 0 1 2 3 4
2 0 1 2 3 4
3 0 1 2 3 4
4 0 1 2 3 4
after executing the previous lines, you get:执行前几行后,您将获得:
x0 x1 x2 other0 other1 y0 y1 y2
0 0 1 2 3 4 0 2 4
1 0 1 2 3 4 0 2 4
2 0 1 2 3 4 0 2 4
3 0 1 2 3 4 0 2 4
4 0 1 2 3 4 0 2 4
Here the code to create df
:这里是创建
df
的代码:
import pandas as pd
import numpy as np
df = pd.DataFrame(
data=np.column_stack([np.full((5,), i) for i in range(5)]),
columns=[f'x{i}' for i in range(3)] + [f'other{i}' for i in range(2)]
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.