[英]Some Na values and not all
Assume that you have a dataset like starwars.假设您有一个类似 starwars 的数据集。 Also assume that you have 2 columns one numeric with 20 NA values and the other with species (human,Droid,machine,etc).
还假设您有 2 列,一个是数字,有 20 个 NA 值,另一列是物种(人类、机器人、机器等)。
How to convert using pipes , only the na values that belong to category humans to the mean of the heights?如何使用管道转换,只有NA值属于类人类的平均高度的?
If we convert it to the total it will be wrong as machines may be a lit smaller or higher and as a result we will have some strange values as for the height of the humans.如果我们将其转换为总数,那将是错误的,因为机器的亮度可能会变小或变高,因此我们会对人类的身高产生一些奇怪的值。
Ps I know how to do it using replace or ifelse, but how to add the categorization Ps 我知道如何使用 replace 或 ifelse,但如何添加分类
In the starwars scenario, you can do the following在星球大战场景中,您可以执行以下操作
library(dplyr)
starwars %>%
group_by(species) %>%
mutate(height = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>%
ungroup()
As you can see from here, height
is filled with the average only with Human as species从这里可以看出,
height
仅以人类为物种填充平均值
library(dplyr)
starwars %>%
group_by(species) %>%
mutate(newheight = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>%
ungroup() %>%
select(species, height, newheight) %>%
filter(is.na(height))
#> # A tibble: 6 x 3
#> species height newheight
#> <chr> <int> <dbl>
#> 1 Human NA 177.
#> 2 Human NA 177.
#> 3 Human NA 177.
#> 4 Human NA 177.
#> 5 Droid NA NA
#> 6 NA NA NA
In this specific example, you need to transform height
into a double
because it's an integer
, and, since if_else
is type-consistent and from the mean
you receive a double
, you need to transform height
accordingly.在这个具体的例子,你需要变换
height
为double
,因为它是一个integer
,并且,由于if_else
的类型是一致的,并从mean
收到一个double
,你需要变换height
相应。
If I understand you correctly, you just want to replace NAs by group means?如果我理解正确,您只是想通过组方式替换 NA?
This should do:这应该做:
data(starwars)
head(starwars)
#This shows one missing value (NAs) for "Droid"
starwars %>%
group_by(species) %>%
summarize(M = mean(height, na.rm=T),
NAs = sum(is.na(height)))
#Replace NAs by group-wise means
starwars <- starwars %>%
group_by(species) %>%
mutate(height = if_else(is.na(height), mean(height, na.rm=T), as.double(height) )) %>%
ungroup()
#Now no missing value any more and means (M) remains the same
starwars %>%
group_by(species) %>%
summarize(M = mean(height, na.rm=T),
NAs = sum(is.na(height)))
I would use case_when
and replace_na
, which was designed for these NA-replacing operations.我会使用
case_when
和replace_na
,它们是为这些 NA 替换操作而设计的。
output<-starwars %>%
mutate(height = case_when(species=='Human' ~ replace_na(height, mean(height, na.rm=TRUE))))
If we are interested in Humans only, we do not need to group_by
.如果我们只对 Humans 感兴趣,则不需要
group_by
。 If we want this transformation for every group, we could use如果我们希望对每个组进行这种转换,我们可以使用
output<-starwars %>%
group_by(species) %>%
mutate(height = replace_na(height, mean(height, na.rm=TRUE)))
We can also use the zoo package with na.aggregate
:我们还可以使用带有
na.aggregate
的 zoo 包:
library(zoo)
output<-starwars %>%
mutate(height = case_when(species=='Human' ~ na.aggregate(height, na.rm=TRUE)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.