简体   繁体   English

使用基于group by的if语句替换NA值

[英]Replace NA values using if statement based on group by

I am looking to do the following in a more elegant manner in R. I believe there is a way but just cant wrap my head around it. 我希望在R中以一种更优雅的方式进行以下操作。我相信有一种方法,但是不能将我的头缠住它。 Following is the problem. 以下是问题所在。

I have a df which contains NAs. 我有一个包含NA的df。 However, I want to make the NAs into zeros where if the sum of the NA is not equal to zero and if the sum is NA then leave as NA. 但是,我想将NA设置为零,如果NA的总和不等于零,并且如果总和为NA,则保留为NA。 The example below should make it clear. 下面的例子应该清楚。

A<-c("A", "A", "A", "A", 
     "B","B","B","B",
     "C","C","C","C")
B<-c(1,NA,NA,1,NA,NA,NA,NA,2,1,2,3)
data<-data.frame(A,B)

Following is how the data looks like 以下是数据的外观

   A  B
1  A  1
2  A NA
3  A NA
4  A  1
5  B NA
6  B NA
7  B NA
8  B NA
9  C  2
10 C  1
11 C  2
12 C  3

And am looking to get a result as per the following 并希望获得以下结果

   A  B
1  A  1
2  A  0
3  A  0
4  A  1
5  B NA
6  B NA
7  B NA
8  B NA
9  C  2
10 C  1
11 C  2
12 C  3

I know I can use inner join by creating a table first and and then making an IF statement based on that table but I was wondering if there is a way to do it in one or two lines of code in R. 我知道我可以使用内部联接,方法是先创建一个表,然后根据该表创建IF语句,但我想知道是否有办法在R中的一两行代码中做到这一点。

Following is the solution related to the inner join I was referring to 以下是与我所指的内部联接有关的解决方案

sum_NA <- function(x) if(all(is.na(x))) NA_integer_ else sum(x, na.rm=TRUE)

data2 <- data %>% group_by(A) %>% summarize(x = sum_NA(B), Y = 
ifelse(is.na(x), TRUE, FALSE))
data2

data2_1 <- right_join(data, data2, by = "A")

data <- mutate(data2_1, B = ifelse(Y == FALSE & is.na(B), 0,B))
data <- select(data, - Y,-x)
data

Maybe solution like this would work: 也许这样的解决方案会起作用:

data[is.na(B) & A %in% unique(na.omit(data)$A), ]$B <- 0

Here you're asking: 在这里,您在问:

  • if B is NA 如果BNA
  • if A is within letters that have non-NA values 如果A在具有non-NA值的字母内

Then make those values 0 . 然后将这些值设为0

或类似地,使用ifelse()

data$B <- ifelse(is.na(data$B) & data$A %in% unique(na.omit(data)$A), 0, data$B)

or with dplyr its: 或使用dplyr

library(dplyr)
data %>%
  mutate(B=ifelse(is.na(B) & A %in% unique(na.omit(data)$A), 0, B))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM