I have encountered some unexpected behaviour when using the within() function in R. I (eventually!) tracked the cause to a situation where the last element(s) of particular columns in question in a data frame contain NA.
I have simplified the code to create a reproducible example. Obviously the real world application in which I encountered this is substantially more complex (data frame >500k rows 400 columns, >100 lines inside within(), etc.), and rather inconvenient to avoid using within().
This works as expected:
fooTest <- data.frame(Group = c("Shell", NA, "Cup", NA, NA),
CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest$Bearing <- NA
fooTest$Bearing[which(fooTest$Group=="Cup")] <-
as.character(fooTest$CupComposition[which(fooTest$Group=="Cup")])
fooTest$Bearing[which(fooTest$Group=="Shell")] <-
as.character(fooTest$LinerComposition[which(fooTest$Group=="Shell")])
fooTest$Bearing
Whereas this (which should be equivalent) throws an error:
fooTest <- data.frame(Group = c("Shell", NA, "Cup", NA, NA),
CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
Bearing <- NA
Bearing[which(Group=="Cup")] <-
as.character(CupComposition[which(Group=="Cup")])
Bearing[which(Group=="Shell")] <-
as.character(LinerComposition[which(Group=="Shell")])
})
The error message is Error in [<-.data.frame
( *tmp*
, nl, value = list(Bearing = c("Polyethylene", : replacement element 1 has 3 rows, need 5
The last two rows, in which Group is NA, are evidently not being included. NA rows in the middle of the data are OK.
A couple of questions:
The behaviour of within() is a bit unexpected; is this a bug? I am not very experienced, so am slightly reticent about filing bugs where it is likely to be my understanding that is deficient!
In this particular case, I expect there is a neater way to populate the "Bearing" column than the method I have employed. Suggestions welcome!
Regarding the error message using within
, you can try:
within(fooTest, {Bearing <- NA
Bearing[Group=='Cup' & !is.na(Group)] <-
as.character(CupComposition)[Group=='Cup' & !is.na(Group)]
Bearing[Group=='Shell' & !is.na(Group)] <-
as.character(LinerComposition)[Group=='Shell' & !is.na(Group)]
})
It is not clear whether the Group
column and all other columns are following some order. From the column names, I couldn't find a common pattern that helps in matching the elements in Group
. Based on the example provided, you could also do (for the bigger dataset)
fooTest1 <- fooTest
fooTest1[] <- lapply(fooTest1, as.character)#convert the columns to character class
Un1 <- sort(unique(na.omit(fooTest1$Group)))
m1 <- do.call(cbind,Map(function(v, x,y)
ifelse(v==y & !is.na(v), x, NA) , list(fooTest1[,1]),
fooTest1[,-1], Un1))
indx1 <- which(!is.na(m1), arr.ind=TRUE)[,1]
fooTest1$Bearing <- NA
fooTest1$Bearing[indx1] <- m1[!is.na(m1)]
fooTest1
# Group CupComposition LinerComposition Bearing
#1 Shell Metal Polyethylene Polyethylene
#2 <NA> <NA> <NA> <NA>
#3 Cup Polyethylene <NA> Polyethylene
#4 <NA> <NA> <NA> <NA>
#5 <NA> Test Test <NA>
I tend to use "%in%" in this case; it handles NAs nicer:
fooTest <- data.frame(Group = c("Shell", NA, "Cup", NA, NA),
CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
Bearing <- NA
Bearing[Group %in% "Cup"] <-
as.character(CupComposition[Group %in% "Cup"])
Bearing[Group %in% "Shell"] <-
as.character(LinerComposition[Group %in% "Shell"])
})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.