[英]How to mutate NA on multiple rows (rowwise) in tibble
I spend sometime try to figure out how to mutate NA
values on multiple rows on row perspective in tibble
, the tibble
has 3 observations and 6 variables, generate below:我花了一些时间试图弄清楚如何在
tibble
行视角上对多行的NA
值进行变异, tibble
有 3 个观察值和 6 个变量,生成如下:
df <- data.frame(ID = c(1, 2, 3),
Score1 = c(90, 80, 70),
Score2 = c(66, 78, 86),
Score3 = c(NA, 86, 96),
Score4 = c(84, 76, 72),
Score5 = c(92, NA, 74))
sample_tibble <- as_tibble(df)
The tibble
looks as tibble
看起来像
# A tibble: 3 x 6
ID Score1 Score2 Score3 Score4 Score5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 90 66 NA 84 92
2 2 80 78 86 76 NA
3 3 70 86 96 72 74
I have to use functions from tidyverse
(eg mutate
, mutate_at
, rowwise
.. etc.), the target is to replace the NA
on row 1 (in Score3
column) and row 2 (in Score5
column) with the mean
of row 1 and row 2 respectively ( mean
calculated with other values on row rather than NA
), so the ideal result should be after mutate我已经从使用的功能
tidyverse
(例如mutate
, mutate_at
, rowwise
..等),目标是代替NA
上排1(在Score3
列)和行2(在Score5
与列) mean
行1和分别为第 2 行( mean
用行上的其他值而不是NA
),所以理想的结果应该是在 mutate 之后
# A tibble: 3 x 6
ID Score1 Score2 Score3 Score4 Score5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 90 66 83 84 92
2 2 80 78 86 76 80
3 3 70 86 96 72 74
The first NA
replace by mean(c(90, 66, NA, 84, 92), na.rm = TRUE)
as 83
第一个
NA
替换为mean(c(90, 66, NA, 84, 92), na.rm = TRUE)
为83
The second NA
replace by mean(c(80, 78, 86, 76, NA), na.rm = TRUE)
as 80
第二个
NA
替换为mean(c(80, 78, 86, 76, NA), na.rm = TRUE)
为80
Tried some code like below, and also check previous doc as Apply a function to every row of a matrix or a data frame or dplyr - using mutate() like rowmeans() , but the code never work since I am able to figure out body of mutate
function尝试了一些像下面这样的代码,并检查以前的文档作为将函数应用于矩阵或数据框或dplyr 的每一行- 使用 mutate() 像 rowmeans() ,但代码从来没有工作,因为我能够找出身体
mutate
函数
sample_tibble[, -1] %>% rowwise() %>% mutate(...)
Not limited on rowwise
or mutate
(such as mutate_at
also good), is there any solution able to mutate row 1 and row 2 to reach the target format (Its great to mutate at same time , not as use for loop
to mutate twice), appreciate any solutions !不限于
rowwise
或mutate
(例如mutate_at
也不错),是否有任何解决方案能够改变第 1 行和第 2 行以达到目标格式(同时变异很好,而不是使用for loop
变异两次),感谢任何解决方案!
A slightly inefficient way would be to gather
and group_by
it:一个稍微低效的方法是
gather
和group_by
它:
sample_tibble %>%
tidyr::gather(k, v, -ID) %>%
group_by(ID) %>%
mutate(v = if_else(is.na(v), mean(v, na.rm = TRUE), v)) %>%
ungroup() %>%
tidyr::spread(k, v)
# # A tibble: 3 x 6
# ID Score1 Score2 Score3 Score4 Score5
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 90 66 83 84 92
# 2 2 80 78 86 76 80
# 3 3 70 86 96 72 74
As RonakShah also reminded me, gather
/ spread
can be replaced with the newer (and more featureful) cousins: pivot_longer
/ pivot_wider
.正如 RonakShah 也提醒我的那样,可以用更新的(和更有特色的)表兄弟来代替
gather
/ spread
: pivot_longer
/ pivot_wider
。
Another technique uses apply
:另一种技术使用
apply
:
sample_tibble %>%
mutate(mu = apply(.[,-1], 1, mean, na.rm = TRUE)) %>%
### similarly, and faster, thanks RonakShah
# mutate(mu = rowMeans(.[,-1], na.rm = TRUE)) %>%
mutate_at(vars(starts_with("Score")), ~ if_else(is.na(.), mu, .)) %>%
select(-mu)
A caveat with this: the .[,-1]
is explicitly using every column except the first;一个警告:
.[,-1]
显式使用除第一列之外的每一列; if you have other columns that were not mentioned in the question, then this will certainly use more data than you intend.如果您有问题中未提及的其他列,那么这肯定会使用比您预期更多的数据。 Unfortunately, I don't know of a way to use
:
-ranging in this solution, as that would be clearer.不幸的是,我不知道在这个解决方案中使用
:
-rangeing 的方法,因为这样会更清楚。
One approach utilizing a little bit of maths could be:使用一点数学的一种方法可能是:
df %>%
mutate_at(vars(-1),
~ pmax(is.na(.)*rowMeans(select(df, -1), na.rm = TRUE),
(!is.na(.))*.,
na.rm = TRUE))
ID Score1 Score2 Score3 Score4 Score5
1 1 90 66 83 84 92
2 2 80 78 86 76 80
3 3 70 86 96 72 74
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.