如何在 tibble 中的多行（rowwise）上改变 NA

Question

I spend sometime try to figure out how to mutate NA values on multiple rows on row perspective in tibble , the tibble has 3 observations and 6 variables, generate below:我花了一些时间试图弄清楚如何在tibble行视角上对多行的NA值进行变异， tibble有 3 个观察值和 6 个变量，生成如下：

df <- data.frame(ID = c(1, 2, 3),
                 Score1 = c(90, 80, 70),
                 Score2 = c(66, 78, 86),
                 Score3 = c(NA, 86, 96),
                 Score4 = c(84, 76, 72),
                 Score5 = c(92, NA, 74))
sample_tibble <- as_tibble(df)

The tibble looks as tibble看起来像

# A tibble: 3 x 6
     ID Score1 Score2 Score3 Score4 Score5
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     1     90     66     NA     84     92
2     2     80     78     86     76     NA
3     3     70     86     96     72     74

I have to use functions from tidyverse (eg mutate , mutate_at , rowwise .. etc.), the target is to replace the NA on row 1 (in Score3 column) and row 2 (in Score5 column) with the mean of row 1 and row 2 respectively ( mean calculated with other values on row rather than NA ), so the ideal result should be after mutate我已经从使用的功能tidyverse （例如mutate ， mutate_at ， rowwise ..等），目标是代替NA上排1（在Score3列）和行2（在Score5与列） mean行1和分别为第 2 行（ mean用行上的其他值而不是NA ），所以理想的结果应该是在 mutate 之后

# A tibble: 3 x 6
     ID Score1 Score2 Score3 Score4 Score5
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     1     90     66     83     84     92
2     2     80     78     86     76     80
3     3     70     86     96     72     74

The first NA replace by mean(c(90, 66, NA, 84, 92), na.rm = TRUE) as 83第一个NA替换为mean(c(90, 66, NA, 84, 92), na.rm = TRUE)为83
The second NA replace by mean(c(80, 78, 86, 76, NA), na.rm = TRUE) as 80第二个NA替换为mean(c(80, 78, 86, 76, NA), na.rm = TRUE)为80

Tried some code like below, and also check previous doc as Apply a function to every row of a matrix or a data frame or dplyr - using mutate() like rowmeans() , but the code never work since I am able to figure out body of mutate function尝试了一些像下面这样的代码，并检查以前的文档作为将函数应用于矩阵或数据框或dplyr 的每一行 - 使用 mutate() 像 rowmeans() ，但代码从来没有工作，因为我能够找出身体mutate函数

sample_tibble[, -1] %>% rowwise() %>% mutate(...)

Not limited on rowwise or mutate (such as mutate_at also good), is there any solution able to mutate row 1 and row 2 to reach the target format (Its great to mutate at same time , not as use for loop to mutate twice), appreciate any solutions !不限于rowwise或mutate （例如mutate_at也不错），是否有任何解决方案能够改变第 1 行和第 2 行以达到目标格式（同时变异很好，而不是使用for loop变异两次），感谢任何解决方案！

Answer 1

A slightly inefficient way would be to gather and group_by it:一个稍微低效的方法是gather和group_by它：

sample_tibble %>%
  tidyr::gather(k, v, -ID) %>%
  group_by(ID) %>%
  mutate(v = if_else(is.na(v), mean(v, na.rm = TRUE), v)) %>%
  ungroup() %>%
  tidyr::spread(k, v)
# # A tibble: 3 x 6
#      ID Score1 Score2 Score3 Score4 Score5
#   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1     1     90     66     83     84     92
# 2     2     80     78     86     76     80
# 3     3     70     86     96     72     74

As RonakShah also reminded me, gather / spread can be replaced with the newer (and more featureful) cousins: pivot_longer / pivot_wider .正如 RonakShah 也提醒我的那样，可以用更新的（和更有特色的）表兄弟来代替gather / spread ： pivot_longer / pivot_wider 。

Another technique uses apply :另一种技术使用apply ：

sample_tibble %>%
  mutate(mu = apply(.[,-1], 1, mean, na.rm = TRUE)) %>%
  ### similarly, and faster, thanks RonakShah
  # mutate(mu = rowMeans(.[,-1], na.rm = TRUE)) %>%
  mutate_at(vars(starts_with("Score")), ~ if_else(is.na(.), mu, .)) %>%
  select(-mu)

A caveat with this: the .[,-1] is explicitly using every column except the first;一个警告： .[,-1]显式使用除第一列之外的每一列； if you have other columns that were not mentioned in the question, then this will certainly use more data than you intend.如果您有问题中未提及的其他列，那么这肯定会使用比您预期更多的数据。 Unfortunately, I don't know of a way to use : -ranging in this solution, as that would be clearer.不幸的是，我不知道在这个解决方案中使用: -rangeing 的方法，因为这样会更清楚。

Answer 2

One approach utilizing a little bit of maths could be:使用一点数学的一种方法可能是：

df %>%
 mutate_at(vars(-1), 
           ~ pmax(is.na(.)*rowMeans(select(df, -1), na.rm = TRUE), 
                  (!is.na(.))*., 
                  na.rm = TRUE))


  ID Score1 Score2 Score3 Score4 Score5
1  1     90     66     83     84     92
2  2     80     78     86     76     80
3  3     70     86     96     72     74

如何在 tibble 中的多行（rowwise）上改变 NA

问题描述

2 个解决方案

解决方案1
4 已采纳 2020-02-08 08:02:53

解决方案2
1 2020-02-08 08:31:02

如何在 tibble 中的多行（rowwise）上改变 NA

问题描述

2 个解决方案

解决方案1 4 已采纳 2020-02-08 08:02:53

解决方案2 1 2020-02-08 08:31:02

解决方案1
4 已采纳 2020-02-08 08:02:53

解决方案2
1 2020-02-08 08:31:02