简体   繁体   English

循环并在 R 中创建新变量

[英]Loops and creating new variables in R

I have a dataset that has multiple years and variables.我有一个包含多年和变量的数据集。 I would specify how many of each, but I'm trying to create a script that can run without having to copy and paste each block for every year/variable so hopefully the code would work regardless of those specifications.我会指定每一个有多少,但我正在尝试创建一个脚本,该脚本无需复制和粘贴每年/变量的每个块,因此希望无论这些规范如何,代码都能正常工作。 Basically, for each variable I have an inflated counterpart, like INCOME and INCOME_INFLATED and I want to create a manually-inflated version of INCOME (INCOME_MANUAL) and compare it to INCOME_INFLATED.基本上,对于每个变量,我都有一个膨胀的对应物,如 INCOME 和 INCOME_INFLATED,我想创建一个手动膨胀版本的 INCOME (INCOME_MANUAL) 并将其与 INCOME_INFLATED 进行比较。

Essentially, here is an example of my input data:本质上,这是我的输入数据的示例:

year income收入 income_inflated收入膨胀 CPIU消费物价指数
2000 2000 1500 1500 3000 3000 2 2个
2001 2001年 1000 1000 1500 1500 1.5 1.5
2002 2002年 2000 2000 6000 6000 3 3个

Here is what I would like my output data to look like:这是我希望我的 output 数据的样子:

year income收入 income_inflated收入膨胀 CPIU消费物价指数 income_manual收入手册
2000 2000 1500 1500 3000 3000 2 2个 3000 3000
2001 2001年 1000 1000 1500 1500 1.5 1.5 1500 1500
2002 2002年 2000 2000 6000 6000 3 3个 6000 6000

Where income_manual is income x CPIU.其中 income_manual 是收入 x CPIU。 CPIU is a numeric variable with a unique value for each year. CPIU 是一个数字变量,每年具有唯一值。 This is very easy for one or two variables, but I am having trouble figuring out how to make this happen for a list of 40+ variables without having to copy and paste the code for each variable.这对于一个或两个变量来说非常容易,但我无法弄清楚如何在 40 多个变量的列表中实现这一点,而不必为每个变量复制和粘贴代码。

I can create a list of relevant variables easily, I just don't know how to create a loop that allows for the naming and creation of new variable, so the user can just input their data file and run it.我可以轻松地创建相关变量列表,我只是不知道如何创建允许命名和创建新变量的循环,因此用户只需输入他们的数据文件并运行它。

This code successfully creates new data files filtered by year named "data_[YEAR]".此代码成功创建了按年份过滤的名为“data_[YEAR]”的新数据文件。 (years is a list of unique values in variable YEAR.) (年是变量 YEAR 中唯一值的列表。)

for (y in years[]) {
  dy <- data %>% filter(YEAR == y)
  assign(paste0("data_", y), dy)
}
remove(dy)

But, when I try to apply the same logic to a variable, it doesn't work.但是,当我尝试将相同的逻辑应用于变量时,它不起作用。 (vars is a list of relevant variables.) (vars 是相关变量的列表。)

for (v in vars[]) {
  data <- data %>% mutate(x = v * CPIU)
  assign(paste0(v, "_manual"), data$x)
}

It gives me the following error:它给了我以下错误:

Error in `mutate()`:
! Problem while computing `x = v * CPIU`.
Caused by error in `v * CPIU`:
! non-numeric argument to binary operator

I'm fairly used to doing these "creating new objects" operations in bash scripts, but not as much in R, so I'm not sure how to call on that kind of "dictionary".我相当习惯在 bash 脚本中执行这些“创建新对象”操作,但在 R 中却不那么多,所以我不确定如何调用那种“字典”。 Essentially, how can I get R to understand "v" as the actual variable instead of the variable's name as a character string?本质上,我怎样才能让 R 将“v”理解为实际变量而不是将变量名理解为字符串?

Essentially, I want to do the following operation:本质上,我想执行以下操作:

data$income_manual <- data$income * data$CPIU

for many variables without having to copy and paste this line over and over.对于许多变量,而不必一遍又一遍地复制和粘贴这一行。

Let me know if more detail or background is needed.让我知道是否需要更多细节或背景。 Thanks so much.非常感谢。

I also know there are a lot of questions on here that are similar to this one, but I can't figure out how to adapt it into my own work.我也知道这里有很多与此类似的问题,但我不知道如何将其应用到我自己的工作中。 I am still relatively new to R, so I apologize for being a bit confused.我对 R 还是比较陌生,所以很抱歉有点困惑。

IIUC - You can assign new columns of data frame from a block of variables using matrix operations: IIUC - 您可以使用矩阵运算从变量块中分配新的数据框列:

relevant_vars <- c("income", ...)

data[paste0(relevant_vars, "_manual")] <- data[relevant_vars] * data$CPIU

To demonstrate with mtcars :mtcars演示:

relevant_vars <- names(mtcars)
mtcars$CPIU <- runif(nrow(mtcars))
  
mtcars[paste0(relevant_vars, "_manual")] <- mtcars[relevant_vars] * mtcars$CPIU

str(mtcars)
'data.frame':   32 obs. of  24 variables:
 $ mpg        : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl        : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp       : num  160 160 108 258 360 ...
 $ hp         : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat       : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt         : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec       : num  16.5 17 18.6 19.4 17 ...
 $ vs         : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am         : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear       : num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb       : num  4 4 1 1 2 1 4 2 2 4 ...
 $ CPIU       : num  0.699 0.616 0.111 0.658 0.957 ...
 $ mpg_manual : num  14.68 12.93 2.54 14.09 17.9 ...
 $ cyl_manual : num  4.194 3.695 0.446 3.95 7.658 ...
 $ disp_manual: num  111.8 98.5 12 169.9 344.6 ...
 $ hp_manual  : num  76.9 67.7 10.4 72.4 167.5 ...
 $ drat_manual: num  2.726 2.402 0.429 2.028 3.015 ...
 $ wt_manual  : num  1.831 1.771 0.259 2.117 3.293 ...
 $ qsec_manual: num  11.51 10.48 2.07 12.8 16.29 ...
 $ vs_manual  : num  0 0 0.111 0.658 0 ...
 $ am_manual  : num  0.699 0.616 0.111 0 0 ...
 $ gear_manual: num  2.796 2.464 0.446 1.975 2.872 ...
 $ carb_manual: num  2.796 2.464 0.111 0.658 1.915 ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM