简体   繁体   English

dplyr :: mutate用从列名创建的动态变量

[英]dplyr::mutate with dynamic variables created from column names

I referred to this and this , both of which did not work for me because they were both based on functions which call mutate and these functions are called probably in a loop, which is what I cannot do. 我提到了 一点 ,这对我都不起作用,因为它们都基于调用mutate的函数,并且这些函数可能在循环中被调用,而这是我做不到的。

I have a df (can be reproduced by following code:) [[note: coincidentally, all "Y_****" columns here happen to have same value, but please ignore that. 我有一个df (可以通过以下代码复制:) [[注:巧合的是,这里的所有"Y_****"列都具有相同的值,但是请忽略该值。 The main dataframe is actually very long, I have put only 6 rows here. 主数据帧实际上很长,我在这里只放了6行。 ]] ]]

mainY <- structure(list(PolygonId = 0:5, Area = c(3.018892, 1.995702, 
2.277057, 1.176975, 1.983469, 4.533144), Perimeter = c(10.6415, 
8.6314, 9.2226, 6.1484, 10.2277, 12.0012), X0 = c(0.59, 0.654, 
0.51, 0.6, 0.622, 0.431), Y0 = c(1.4, 1.4, 1.4, 1.4, 1.4, 1.4
), phi = c(0.3, 0.3, 0.3, 0.3, 0.5, 0.3), J0 = c(0.49199, 0.33466, 
0.55057, 0.5076, 0.46434, 0.6574), h0 = c(1669.494, 1656.977, 
1683.435, 1660.62, 1670.445, 1707.416), mat0 = c(0.58, 0.74, 
0.39, 0.67, 0.47, 0.24), tc0 = c(0.4, 0.42, 0.37, 0.41, 0.38, 
0.35), z0 = c(0.8272, 0.8044, 0.744, 0.8505, 1.0288, 0.6703), 
    W0 = c(4764.9472, 3147.8891, 2859.4418, 1974.6163, 4127.504, 
    4670.4702), Y_a02 = c(1.4, 1.4, 1.4, 1.4, 1.4, 1.4), Y_a03 = c(1.4, 
    1.4, 1.4, 1.4, 1.4, 1.4), Y_a04 = c(1.4, 1.4, 1.4, 1.4, 1.4, 
    1.4), Y_b05 = c(1.4, 1.4, 1.4, 1.4, 1.4, 1.4), Y_b06 = c(1.4, 
    1.4, 1.4, 1.4, 1.4, 1.4)), .Names = c("PolygonId", "Area", 
"Perimeter", "X0", "Y0", "phi", "J0", "h0", "mat0", "tc0", "z0", 
"W0", "Y_a02", "Y_a03", "Y_a04", "Y_b05", "Y_b06"), row.names = c(NA, 
6L), class = "data.frame")

This is how it looks like: 它是这样的:

在此处输入图片说明

The actual number of Y_**** columns is more than 20. It can increase when I decide to increase more data. Y_**** columns的实际数量大于20。当我决定增加更多数据时,它可以增加。

My code is as follow: (explanation is below the code) 我的代码如下:(解释在代码下方)

calc.z <- function(x, y, phi, j){
  round(x * y * (phi + sqrt(j)),4)
}
calc.W <- function(tc, h, z, area){
  round(tc * h * pi * sqrt(z^3) * area, 4)
}

I need to calculate z and W for each row. 我需要为每一行计算zW The functions above show the formula for z and W . 上面的函数显示zW的公式。

Normally what I'd do is: 通常我要做的是:

newdf<- dplyr::mutate(mainY,
       z_Y_a02  = calc.z(X0, Y_a02 , phi, J0), W_Y_a02  = calc.W(tc0, h0, z_Y_a02,  Area),
       z_Y_a03  = calc.z(X0, Y_a03 , phi, J0), W_Y_a03  = calc.W(tc0, h0, z_Y_a03,  Area))
# and so on 
# note here, for calculating all z_****, X0, phi, J0 are always used the same
# same for calculating W

But this is tedious, repeating same for 20+ columns and many more. 但这很乏味,重复20列以上甚至更多。

For the examples I mentioned above, I re wrote the code as follows:: 对于上面提到的示例,我将代码重新编写如下:

newdf<- dplyr::mutate_at(mainY,
                      .vars = c("Y_a02","Y_a03","Y_a04","Y_b05","Y_b06"),
                      .funs = calc.z("X0",.vars,"phi","J0")
                      )
# This did not work. I again changed like this:

newdf<- dplyr::mutate_at(mainY,
                         .vars = c("Y_a02","Y_a03","Y_a04","Y_b05","Y_b06"),
                         .funs = calc.z("X0",.vars,"phi","J0")
                          )
# This does not work as well.

The following is the format of the result I expect. 以下是我期望的结果格式。 (( ## represents some number. )) (( ##代表一些数字。)

> newdf
  PolygonId     Area Perimeter    X0  Y0 phi      J0       h0 mat0  tc0     z0       W0 Y_a02 Y_a03 Y_a04 Y_b05 Y_b06   z_Y_a02   W_Y_a02   z_Y_a03   W_Y_a03   z_Y_a04   W_Y_a04   z_Y_b05   W_Y_b05   z_Y_b06   W_Y_b06
1         0 3.018892   10.6415 0.590 1.4 0.3 0.49199 1669.494 0.58 0.40 0.8272 4764.947   1.4   1.4   1.4   1.4   1.4      ##        ##         ##        ##       ##       ##        ##        ##         ##        ##
2         1 1.995702    8.6314 0.654 1.4 0.3 0.33466 1656.977 0.74 0.42 0.8044 3147.889   1.4   1.4   1.4   1.4   1.4      ##        ##         ##        ##       ##       ##        ##        ##         ##        ##
3         2 2.277057    9.2226 0.510 1.4 0.3 0.55057 1683.435 0.39 0.37 0.7440 2859.442   1.4   1.4   1.4   1.4   1.4      ##        ##         ##        ##       ##       ##        ##        ##         ##        ##
4         3 1.176975    6.1484 0.600 1.4 0.3 0.50760 1660.620 0.67 0.41 0.8505 1974.616   1.4   1.4   1.4   1.4   1.4      ##        ##         ##        ##       ##       ##        ##        ##         ##        ##
5         4 1.983469   10.2277 0.622 1.4 0.5 0.46434 1670.445 0.47 0.38 1.0288 4127.504   1.4   1.4   1.4   1.4   1.4      ##        ##         ##        ##       ##       ##        ##        ##         ##        ##
6         5 4.533144   12.0012 0.431 1.4 0.3 0.65740 1707.416 0.24 0.35 0.6703 4670.470   1.4   1.4   1.4   1.4   1.4      ##        ##         ##        ##       ##       ##        ##        ##         ##        ##

This should work: 这应该工作:

library(tidyverse)
mainY %>%
  mutate_at(.vars = vars(Y_a02, Y_a03, Y_a04, Y_b05, Y_b06),
            .funs = funs(calc.z = calc.z(X0,.,phi,J0)))
#output

**  PolygonId     Area Perimeter    X0  Y0 phi      J0       h0 mat0  tc0     z0       W0 Y_a02 Y_a03 Y_a04
1         0 3.018892   10.6415 0.590 1.4 0.3 0.49199 1669.494 0.58 0.40 0.8272 4764.947   1.4   1.4   1.4
2         1 1.995702    8.6314 0.654 1.4 0.3 0.33466 1656.977 0.74 0.42 0.8044 3147.889   1.4   1.4   1.4
3         2 2.277057    9.2226 0.510 1.4 0.3 0.55057 1683.435 0.39 0.37 0.7440 2859.442   1.4   1.4   1.4
4         3 1.176975    6.1484 0.600 1.4 0.3 0.50760 1660.620 0.67 0.41 0.8505 1974.616   1.4   1.4   1.4
5         4 1.983469   10.2277 0.622 1.4 0.5 0.46434 1670.445 0.47 0.38 1.0288 4127.504   1.4   1.4   1.4
6         5 4.533144   12.0012 0.431 1.4 0.3 0.65740 1707.416 0.24 0.35 0.6703 4670.470   1.4   1.4   1.4
  Y_b05 Y_b06 Y_a02_calc.z Y_a03_calc.z Y_a04_calc.z Y_b05_calc.z Y_b06_calc.z
1   1.4   1.4       0.8272       0.8272       0.8272       0.8272       0.8272
2   1.4   1.4       0.8044       0.8044       0.8044       0.8044       0.8044
3   1.4   1.4       0.7440       0.7440       0.7440       0.7440       0.7440
4   1.4   1.4       0.8505       0.8505       0.8505       0.8505       0.8505
5   1.4   1.4       1.0288       1.0288       1.0288       1.0288       1.0288
6   1.4   1.4       0.6703       0.6703       0.6703       0.6703       0.6703

if you would like just to replace the Y_b** columns with the result: 如果您只想用结果替换Y_b**列:

mainY %>%
      mutate_at(.vars = vars(Y_a02, Y_a03, Y_a04, Y_b05, Y_b06),
                .funs = funs(calc.z(X0, ., phi, J0)))

instead of typing all the columns in .vars you can also do something like this: 除了在.vars中键入所有列.vars您还可以执行以下操作:

colnames(mainY)[grep("Y_.\\d{2}$", colnames(mainY))]

or "^Y_" as the pattern if non of the other columns start with "Y_" . 如果没有其他列以"Y_"开头,则以"^Y_"作为模式。

where "Y_.\\\\d{2}$" depends works for the example in question but might need changing for your real table: 其中"Y_.\\\\d{2}$"取决于所讨论示例的工作原理,但可能需要更改您的实际表:

mainY %>%
  mutate_at(.vars = colnames(.)[grep("Y_.\\d{2}$", colnames(.))],
            .funs = funs(calc.z = calc.z(X0, ., phi, J0)))

EDIT answer to the question in comments: 编辑评论中问题的答案:

A way to pass the calculated calc.z values to calc.W . 一种将计算的calc.z值传递给calc.W All columns will be kept: 所有列都将保留:

 mainY %>%
      mutate_at(.vars = colnames(.)[grep("Y_.\\d{2}$", colnames(.))],
                .funs = funs(calc.z = calc.z(X0, ., phi, J0))) %>%
      mutate_at(.vars = colnames(.)[grep("_calc.z$", colnames(.))],
                .funs = funs(calc.W = calc.z(tc0, h0, ., Area))

)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM