简体   繁体   English

用单独列中的其他值替换不同列中子组的NA值

[英]Replace NA values of subgroups in different columns with other values in separate column

My problem: 我的问题:

Tom_dog <- c(1,4,NA,6,10,5)
Joe_dog <- c(2,NA,8,10,12,5)
Theo_dog <- c(5,1,6,8,NA,7)
Gus_cat <- c(9,10,14,12,13,NA)
Walz_cat <- c(NA, 9,8,7,4,2)
Ron_cat <- c(15,13,NA,2,5,6)
df <- data.frame(Tom_dog,Joe_dog,Theo_dog,Gus_cat,Walz_cat,Ron_cat)

I calculate the mean for the dogs and the cats and attach it to the dataframe in a new column 我计算了狗和猫的平均值,并将其附加到新列的数据框中

df$dog_mean <- rowMeans(df[ , grepl("^.+(_dog)$", colnames(df))], na.rm = TRUE)
df$cat_mean <- rowMeans(df[ , grepl("^.+(_cat)$", colnames(df))], na.rm = TRUE)

Now, what I would like to do is replace the NA value of the dogs, with the mean of of the dog in the same row. 现在,我想用同一行中的狗的平均值替换狗的NA值。 In the second step the same thing with the cats. 在第二步中,对猫也是如此。 I tried somethin like this, but didn't work: 我试过像这样的东西,但是没用:

df[ , grepl("^.+(_dog)$", colnames(df))][is.na(df[ , grepl("^.+(_dog)$", colnames(df))])]
<- df$dog_mean[is.na(df[ , grepl("^.+(_dog)$", colnames(df))])]

Help greatly appreciated! 帮助极大的赞赏!

Instead of trying to do the transformation in a single step, you might be better off with an lapply call to make the conversion one column at a time (I'm using magrittr here just to save typing the entire first line twice: 与其尝试在单个步骤中进行转换,不如通过一次lapply调用一次将转换转换为一列更好(我在这里使用magrittr只是为了节省两次输入整个第一行的时间):

library( magrittr )
df[ , grepl("^.+(_dog)$", colnames(df))] %<>%
    lapply( function( x, vals ) {
        ifelse( is.na( x ), vals, x )
    },
    vals = df$dog_mean )

And the same for cats: 和猫一样:

df[ , grepl("^.+(_cat)$", colnames(df))] %<>%
    lapply( function( x, vals ) {
        ifelse( is.na( x ), vals, x )
    },
    vals = df$cat_mean )

In base R, you can do this with two passes of lapply : 在基数R中,您可以通过两次lapply来做到这lapply

# dogs
df[, grepl("_dog", names(df))] <- lapply(df[, grepl("_dog", names(df))],
                                       function(i) {i[is.na(i)] <- df$dog_mean[is.na(i)]; i})
# cats
df[, grepl("_cat", names(df))] <- lapply(df[, grepl("_cat", names(df))],
                                       function(i) {i[is.na(i)] <- df$cat_mean[is.na(i)]; i})

Here, the list that lapply returns is fed back into the appropriate spot in the data.frame. 在这里, lapply返回的列表被反馈到data.frame中的适当位置。 The {} make sure that the entire block of code (two lines, separated by ; is executed in one go). {}确保整个代码块(两行,用;分隔)一次性执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM