简体   繁体   English

如何通过 for 循环在数据框中添加更多列

[英]How can i add more columns in dataframe by for loop

I am beginner of R. I need to transfer some Eviews code to R. There are some loop code to add 10 or more columns\\variables with some function in data in Eviews.我是 R 的初学者。我需要将一些 Eviews 代码转移到 R。有一些循环代码可以在 Eviews 中的数据中添加 10 个或更多列\\变量和一些函数。

Here are eviews example code to estimate deflator:下面是 eviews 示例代码来估计平减指数:

for %x exp con gov inv cap ex im
frml def_{%x} = gdp_{%x}/gdp_{%x}_r*100
next 

I used dplyr package and use mutate function.我使用了 dplyr 包并使用了 mutate 功能。 But it is very hard to add many variables.但是添加很多变量是非常困难的。

library(dplyr)
nominal_gdp<-rnorm(4)
nominal_inv<-rnorm(4)
nominal_gov<-rnorm(4)
nominal_exp<-rnorm(4)

real_gdp<-rnorm(4)
real_inv<-rnorm(4)
real_gov<-rnorm(4)
real_exp<-rnorm(4)   

df<-data.frame(nominal_gdp,nominal_inv,
nominal_gov,nominal_exp,real_gdp,real_inv,real_gov,real_exp)

 df<-df %>% mutate(deflator_gdp=nominal_gdp/real_gdp*100,
 deflator_inv=nominal_inv/real_inv, 
 deflator_gov=nominal_gov/real_gov,
 deflator_exp=nominal_exp/real_exp)

 print(df)

Please help me to this in R by loop.请帮助我在 R 中循环。

The answer is that your data is not as "tidy" as it could be.答案是您的数据并不像它应有的那样“整洁”。

This is what you have (with an added observation ID for clarity):这就是您所拥有的(为了清楚起见,添加了观察 ID):

library(dplyr)

df <- data.frame(nominal_gdp = rnorm(4),
                 nominal_inv = rnorm(4),
                 nominal_gov = rnorm(4),
                 real_gdp = rnorm(4),
                 real_inv = rnorm(4),
                 real_gov = rnorm(4))
df <- df %>%
  mutate(obs_id = 1:n()) %>%
  select(obs_id, everything())

which gives:这使:

   obs_id nominal_gdp nominal_inv nominal_gov    real_gdp   real_inv  real_gov
 1      1  -0.9692060  -1.5223055 -0.26966202  0.49057546  2.3253066 0.8761837
 2      2   1.2696927   1.2591910  0.04238958 -1.51398652 -0.7209661 0.3021453
 3      3   0.8415725  -0.1728212  0.98846942 -0.58743294 -0.7256786 0.5649908
 4      4  -0.8235101   1.0500614 -0.49308092  0.04820723 -2.0697008 1.2478635

Consider if you had instead, in df2 :考虑一下你是否有,在df2

   obs_id variable        real     nominal
1       1      gdp  0.49057546 -0.96920602
2       2      gdp -1.51398652  1.26969267
3       3      gdp -0.58743294  0.84157254
4       4      gdp  0.04820723 -0.82351006
5       1      inv  2.32530662 -1.52230550
6       2      inv -0.72096614  1.25919100
7       3      inv -0.72567857 -0.17282123
8       4      inv -2.06970078  1.05006136
9       1      gov  0.87618366 -0.26966202
10      2      gov  0.30214534  0.04238958
11      3      gov  0.56499079  0.98846942
12      4      gov  1.24786355 -0.49308092

Then what you want to do is trivial:那么你想要做的是微不足道的:

df2 %>% mutate(deflator = real / nominal)
   obs_id variable        real     nominal    deflator
1       1      gdp  0.49057546 -0.96920602 -0.50616221
2       2      gdp -1.51398652  1.26969267 -1.19240392
3       3      gdp -0.58743294  0.84157254 -0.69801819
4       4      gdp  0.04820723 -0.82351006 -0.05853872
5       1      inv  2.32530662 -1.52230550 -1.52749012
6       2      inv -0.72096614  1.25919100 -0.57256297
7       3      inv -0.72567857 -0.17282123  4.19901294
8       4      inv -2.06970078  1.05006136 -1.97102841
9       1      gov  0.87618366 -0.26966202 -3.24919196
10      2      gov  0.30214534  0.04238958  7.12782060
11      3      gov  0.56499079  0.98846942  0.57158146
12      4      gov  1.24786355 -0.49308092 -2.53074800

So the question becomes: how do we get to the nice dplyr-compatible data.frame.所以问题变成了:我们如何获得与 dplyr 兼容的好 data.frame。

You need to gather your data using tidyr::gather .您需要使用tidyr::gather数据。 However, because you have 2 sets of variables to gather (the real and nominal values), it is not straightforward.但是,因为您有 2 组变量要收集(真实值和名义值),所以这并不简单。 I have done it in two steps, there may be a better way though.我分两步完成,不过可能有更好的方法。

real_vals <- df %>%
  select(obs_id, starts_with("real")) %>%
  # the line below is where the magic happens
  tidyr::gather(variable, real, starts_with("real")) %>%
  # extracting the variable name (by erasing up to the underscore)
  mutate(variable = gsub(variable, pattern = ".*_", replacement = ""))

# Same thing for nominal values
nominal_vals <- df %>%
  select(obs_id, starts_with("nominal")) %>%
  tidyr::gather(variable, nominal, starts_with("nominal")) %>%
  mutate(variable = gsub(variable, pattern = ".*_", replacement = ""))

# Merging them... Now we have something we can work with!
df2 <-
  full_join(real_vals, nominal_vals, by = c("obs_id", "variable"))

Note the importance of the observation id when merging.请注意合并时观察 id 的重要性。

We can grep the matching names, and sort:我们可以 grep 匹配的名称,并排序:

x <- colnames(df)
df[ sort(x[ (grepl("^nominal", x)) ]) ] /
  df[ sort(x[ (grepl("^real", x)) ]) ] * 100

Similarly, if the columns were sorted, then we could just:同样,如果对列进行了排序,那么我们可以:

df[ 1:4 ] / df[ 5:8 ] * 100

We can loop over column names using purrr::map_dfc then apply a custom function over the selected columns (ie the columns that matched the current name from nms )我们可以使用purrr::map_dfc循环列名,然后在选定的列上应用自定义函数(即与nms的当前名称匹配的列)

library(dplyr)
library(purrr)
#Replace anything before _ with empty string
nms <- unique(sub('.*_','',names(df)))
#Use map if you need the ouptut as a list not a dataframe
map_dfc(nms, ~deflator_fun(df, .x))

Custom function自定义功能

deflator_fun <- function(df, x){
  #browser()
  nx <- paste0('nominal_',x)
  rx <- paste0('real_',x)  
  select(df, matches(x)) %>% 
    mutate(!!paste0('deflator_',quo_name(x)) := !!ensym(nx) / !!ensym(rx)*100)
}
#Test
deflator_fun(df, 'gdp')
      nominal_gdp     real_gdp deflator_gdp
1  -0.3332074  0.181303480   -183.78433
2  -1.0185754 -0.138891362    733.36121
3  -1.0717912  0.005764186 -18593.97398
4   0.3035286  0.385280401     78.78123

Note: Learn more about quo_name , !!注意:了解更多关于quo_name , !! , and ensym which they are tools for programming with dplyr here , 和ensym它们是在这里用 dplyr 编程的工具

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何遍历列,然后将其添加到数据框 - how to loop through columns and then add them to a dataframe 如何将一个数据框的行添加到另一个的列中 - How can I add rows of a dataframe into columns of another 如何将数据帧拆分两列,并根据组计算行数更有效 - how can I split a dataframe by two columns and count number of rows based on group more efficient 我想使用for循环在R数据框中添加多个列 - I want to add multiple columns in my R dataframe using for loop 如何在循环内在R中的数据框中创建新列并向其中添加新列? - How to create and add new columns to a dataframe in R within a loop? 使用 for 循环创建新列,但我想在循环中这些新列的每个名称中添加“.Corr”。 我怎样才能做到这一点? - make new columns with for loop but i want to add “.Corr” to each name of these new columns in the loop. How can i do that? 如果有 NA,如何折叠 dataframe 的列? - How can I collapse columns of a dataframe if there are NAs? 如何将包含可解析字段的字符串添加到可以添加到数据帧的列中 - How do I add a character string containing parseable fields into columns that can be added to a dataframe 我可以将列表中的数据添加到数据框的列吗 - Can I add data from a list to the columns of a dataframe 在R中,当两个数据帧中的某些值相等时,如何将数据帧中的某些特定列添加到另一个数据帧? - In R, how can I add some specific columns from a dataframe to another dataframe when some values are equal in both dataframes?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM