處理 R function 中的缺失因子水平

Question

我正在嘗試編寫一個簡單的 function 來調整某些幾何形狀的暴露表面積，具體取決於它們如何相互連接。 它看起來像這樣：

funct <- function(A, shape, x) {
radius <- x / 2
A <- dplyr::case_when(
  shape == "sphere" ~ A - (pi * radius^2), 
  shape == "cylinder" ~ A - 2*(pi * radius^2), 
  shape == "ellipsoid" ~ A - (0.2 * A[which(shape == "sphere")] + (2 * pi * radius[which(shape == "cylinder")]))
)
  return(A)
}

這很簡單，但在實際數據集中往往缺少因子水平，這意味着簡單的調整不起作用：

testdata <- 
  data.frame(ind = paste(letters[1:10]), A = rnorm(10), shape = rep(c("sphere", "ellipsoid"), each = 5), x = rnorm(10))

testdata$Aadj <- funct(A = testdata$A, shape = testdata$shape, x = testdata$x)
#Error: `shape == "ellipsoid"... must be length 10 or one, not 0

我可以通過完成數據集手動解決這個問題：

shapes <- as.vector(c("sphere", "cylinder", "ellipsoid"))
testdata <- tidyr::complete(testdata, ind, shape = shapes, fill=list(A = 0))
testdata$Aadj <- funct(A = testdata$A, shape = testdata$shape, x = testdata$x)

為了使它更整潔，我會對如何處理實際 function 中缺失因子水平的一些輸入感興趣。 我認為這可以通過首先將它們添加到數據中來解決（將“A”設置為 0 以允許計算），然后在返回數據之前再次刪除它們？

我還對如何在 function 中的主題（testdata df 中的“ind”）中循環這個建議感興趣（而不是例如在應用函數時在 dplyr 管道中設置它）。

非常感謝。

Answer 1

使用純 R 似乎可以正常運行：

funct <- function(A, shape, x) {
  radius <- x / 2
  sph <- which(shape == "sphere")
  cyl <- which(shape == "cylinder")

  A_new <- ifelse(shape == "sphere", A - (pi * radius^2), 
                  ifelse(shape == "cylinder", A - 2*(pi * radius^2), 
                         A - (0.2 * A[sph] + (2 * pi * radius[cyl]))))
  A_new
}

testdata$Aadj <- funct(A = testdata$A, shape = testdata$shape, x = testdata$x)
testdata

#   ind           A     shape           x       Aadj
#1    a  0.92219266    sphere  1.49043259 -0.8224824
#2    b -0.43705855    sphere  0.21633097 -0.4738145
#3    c  0.66549715    sphere  1.63981414 -1.4464310
#4    d -1.56945688    sphere -1.51169390 -3.3642633
#5    e  0.06975590    sphere -1.68775240 -2.1674572
#6    f  0.02811881 ellipsoid -1.04717409         NA
#7    g  0.95586893 ellipsoid -0.24831690         NA
#8    h  0.79428218 ellipsoid  0.03230311         NA
#9    i  1.86062696 ellipsoid -0.66786452         NA
#10   j  0.53938164 ellipsoid -1.26945744         NA

Answer 2

謝謝大家，這很有幫助。 我自己考慮了一些，並意識到一個非常簡單的添加 NA 感知 function 可以使用保留語法的 rest 來解決問題。 這效果更好，因為即使缺少因子水平，我仍然希望 go 提前進行計算。 因此：

sum_ <- function(...) sum(..., na.rm=T) 
funct <- function(A, shape, x) {
  radius <- x / 2
  A <- dplyr::case_when(
    shape == "sphere" ~ A - sum_((pi * radius[which(shape == 'cylinder')]^2)), 
    shape == "cylinder" ~ A -  sum_(2*(pi * radius^2)), 
    shape == "ellipsoid" ~ A -  sum_((0.2 * A[which(shape == "sphere")]), (2 * pi * radius[which(shape == "cylinder")]))
  )
  return(A)
}

處理 R function 中的缺失因子水平

問題描述

2 個解決方案

解決方案1
0 2020-05-22 12:52:59

解決方案2
0 2020-05-22 13:43:12

處理 R function 中的缺失因子水平

問題描述

2 個解決方案

解決方案1 0 2020-05-22 12:52:59

解決方案2 0 2020-05-22 13:43:12

解決方案1
0 2020-05-22 12:52:59

解決方案2
0 2020-05-22 13:43:12