简体   繁体   English

R inside()函数:当最后一个值不适用时发生意外错误

[英]R within() function: unexpected error when last value(s) are NA

I have encountered some unexpected behaviour when using the within() function in R. I (eventually!) tracked the cause to a situation where the last element(s) of particular columns in question in a data frame contain NA. 在R中使用inner()函数时,我遇到了一些意外的行为。(最终!)我将原因归结为一种情况,即数据帧中所讨论的特定列的最后一个元素包含NA。

I have simplified the code to create a reproducible example. 我简化了代码以创建可复制的示例。 Obviously the real world application in which I encountered this is substantially more complex (data frame >500k rows 400 columns, >100 lines inside within(), etc.), and rather inconvenient to avoid using within(). 显然,我在其中遇到的现实应用程序要复杂得多(数据框> 500k行,400列,inside()内的> 100行,等等),并且避免使用int()非常不便。

This works as expected: 这按预期工作:

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest$Bearing <- NA
fooTest$Bearing[which(fooTest$Group=="Cup")] <-
  as.character(fooTest$CupComposition[which(fooTest$Group=="Cup")])
fooTest$Bearing[which(fooTest$Group=="Shell")] <-
  as.character(fooTest$LinerComposition[which(fooTest$Group=="Shell")])
fooTest$Bearing

Whereas this (which should be equivalent) throws an error: 而这(应该等效)会引发错误:

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
  Bearing <- NA
  Bearing[which(Group=="Cup")] <-
    as.character(CupComposition[which(Group=="Cup")])
  Bearing[which(Group=="Shell")] <-
    as.character(LinerComposition[which(Group=="Shell")])
})

The error message is Error in [<-.data.frame ( *tmp* , nl, value = list(Bearing = c("Polyethylene", : replacement element 1 has 3 rows, need 5 错误消息是[<-.data.frame*tmp* ,nl,value = list(Bearing = c(“ Polyethylene” [<-.data.frame替换元素1有3行,需要5行

The last two rows, in which Group is NA, are evidently not being included. 显然不包括组为NA的最后两行。 NA rows in the middle of the data are OK. 数据中间的NA行可以。

A couple of questions: 几个问题:

  1. The behaviour of within() is a bit unexpected; inside()的行为有点出乎意料; is this a bug? 这是一个错误吗? I am not very experienced, so am slightly reticent about filing bugs where it is likely to be my understanding that is deficient! 我不是很有经验,所以对提交错误(在我的理解中可能是不足的)稍有保留!

  2. In this particular case, I expect there is a neater way to populate the "Bearing" column than the method I have employed. 在这种特殊情况下,我希望有一种比我所采用的方法更简洁的方法来填充“轴承”列。 Suggestions welcome! 欢迎提出建议!

Regarding the error message using within , you can try: 关于使用错误信息within ,你可以试试:

 within(fooTest, {Bearing <- NA
      Bearing[Group=='Cup' & !is.na(Group)] <- 
           as.character(CupComposition)[Group=='Cup' & !is.na(Group)]
     Bearing[Group=='Shell' & !is.na(Group)] <- 
           as.character(LinerComposition)[Group=='Shell' & !is.na(Group)]
  })

It is not clear whether the Group column and all other columns are following some order. 目前尚不清楚“ Group列和所有其他列是否遵循某种顺序。 From the column names, I couldn't find a common pattern that helps in matching the elements in Group . 从列名中,我找不到有助于匹配Group的元素的通用模式。 Based on the example provided, you could also do (for the bigger dataset) 根据提供的示例,您也可以这样做(对于更大的数据集)

 fooTest1 <- fooTest
 fooTest1[] <- lapply(fooTest1, as.character)#convert the columns to character class
 Un1 <- sort(unique(na.omit(fooTest1$Group)))


 m1 <-  do.call(cbind,Map(function(v, x,y)
              ifelse(v==y & !is.na(v), x, NA) , list(fooTest1[,1]),
                                       fooTest1[,-1], Un1))

 indx1 <- which(!is.na(m1), arr.ind=TRUE)[,1]
 fooTest1$Bearing <- NA
 fooTest1$Bearing[indx1] <- m1[!is.na(m1)]
 fooTest1
 #   Group CupComposition LinerComposition      Bearing
 #1 Shell          Metal     Polyethylene Polyethylene
 #2  <NA>           <NA>             <NA>         <NA>
 #3   Cup   Polyethylene             <NA> Polyethylene
 #4  <NA>           <NA>             <NA>         <NA>
 #5  <NA>           Test             Test         <NA>

I tend to use "%in%" in this case; 在这种情况下,我倾向于使用“%in%”; it handles NAs nicer: 它可以更好地处理NA:

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
  Bearing <- NA
  Bearing[Group %in% "Cup"] <-
    as.character(CupComposition[Group %in% "Cup"])
  Bearing[Group %in% "Shell"] <-
    as.character(LinerComposition[Group %in% "Shell"])
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM