R inside（）函數：當最后一個值不適用時發生意外錯誤

Question

在R中使用inner（）函數時，我遇到了一些意外的行為。（最終！）我將原因歸結為一種情況，即數據幀中所討論的特定列的最后一個元素包含NA。

我簡化了代碼以創建可復制的示例。 顯然，我在其中遇到的現實應用程序要復雜得多（數據框> 500k行，400列，inside（）內的> 100行，等等），並且避免使用int（）非常不便。

這按預期工作：

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest$Bearing <- NA
fooTest$Bearing[which(fooTest$Group=="Cup")] <-
  as.character(fooTest$CupComposition[which(fooTest$Group=="Cup")])
fooTest$Bearing[which(fooTest$Group=="Shell")] <-
  as.character(fooTest$LinerComposition[which(fooTest$Group=="Shell")])
fooTest$Bearing

而這（應該等效）會引發錯誤：

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
  Bearing <- NA
  Bearing[which(Group=="Cup")] <-
    as.character(CupComposition[which(Group=="Cup")])
  Bearing[which(Group=="Shell")] <-
    as.character(LinerComposition[which(Group=="Shell")])
})

錯誤消息是[<-.data.frame （ *tmp* ，nl，value = list（Bearing = c（“ Polyethylene” [<-.data.frame替換元素1有3行，需要5行

顯然不包括組為NA的最后兩行。 數據中間的NA行可以。

幾個問題：

inside（）的行為有點出乎意料； 這是一個錯誤嗎？ 我不是很有經驗，所以對提交錯誤（在我的理解中可能是不足的）稍有保留！
在這種特殊情況下，我希望有一種比我所采用的方法更簡潔的方法來填充“軸承”列。 歡迎提出建議！

Answer 1

關於使用錯誤信息within ，你可以試試：

 within(fooTest, {Bearing <- NA
      Bearing[Group=='Cup' & !is.na(Group)] <- 
           as.character(CupComposition)[Group=='Cup' & !is.na(Group)]
     Bearing[Group=='Shell' & !is.na(Group)] <- 
           as.character(LinerComposition)[Group=='Shell' & !is.na(Group)]
  })

目前尚不清楚“ Group列和所有其他列是否遵循某種順序。 從列名中，我找不到有助於匹配Group的元素的通用模式。 根據提供的示例，您也可以這樣做（對於更大的數據集）

 fooTest1 <- fooTest
 fooTest1[] <- lapply(fooTest1, as.character)#convert the columns to character class
 Un1 <- sort(unique(na.omit(fooTest1$Group)))


 m1 <-  do.call(cbind,Map(function(v, x,y)
              ifelse(v==y & !is.na(v), x, NA) , list(fooTest1[,1]),
                                       fooTest1[,-1], Un1))

 indx1 <- which(!is.na(m1), arr.ind=TRUE)[,1]
 fooTest1$Bearing <- NA
 fooTest1$Bearing[indx1] <- m1[!is.na(m1)]
 fooTest1
 #   Group CupComposition LinerComposition      Bearing
 #1 Shell          Metal     Polyethylene Polyethylene
 #2  <NA>           <NA>             <NA>         <NA>
 #3   Cup   Polyethylene             <NA> Polyethylene
 #4  <NA>           <NA>             <NA>         <NA>
 #5  <NA>           Test             Test         <NA>

Answer 2

在這種情況下，我傾向於使用“％in％”； 它可以更好地處理NA：

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
  Bearing <- NA
  Bearing[Group %in% "Cup"] <-
    as.character(CupComposition[Group %in% "Cup"])
  Bearing[Group %in% "Shell"] <-
    as.character(LinerComposition[Group %in% "Shell"])
})

R inside（）函數：當最后一個值不適用時發生意外錯誤

問題描述

2 個解決方案

解決方案1
0 2014-11-01 18:46:51

解決方案2
0 已采納 2014-11-01 20:31:19

R inside（）函數：當最后一個值不適用時發生意外錯誤

問題描述

2 個解決方案

解決方案1 0 2014-11-01 18:46:51

解決方案2 0 已采納 2014-11-01 20:31:19

解決方案1
0 2014-11-01 18:46:51

解決方案2
0 已采納 2014-11-01 20:31:19