简体   繁体   English

一些 Na 值而不是全部

[英]Some Na values and not all

Assume that you have a dataset like starwars.假设您有一个类似 starwars 的数据集。 Also assume that you have 2 columns one numeric with 20 NA values and the other with species (human,Droid,machine,etc).还假设您有 2 列,一个是数字,有 20 个 NA 值,另一列是物种(人类、机器人、机器等)。

How to convert using pipes , only the na values that belong to category humans to the mean of the heights?如何使用管道转换,只有NA值属于类人类的平均高度的?

If we convert it to the total it will be wrong as machines may be a lit smaller or higher and as a result we will have some strange values as for the height of the humans.如果我们将其转换为总数,那将是错误的,因为机器的亮度可能会变小或变高,因此我们会对人类的身高产生一些奇怪的值。

Ps I know how to do it using replace or ifelse, but how to add the categorization Ps 我知道如何使用 replace 或 ifelse,但如何添加分类

In the starwars scenario, you can do the following在星球大战场景中,您可以执行以下操作

library(dplyr)

starwars %>% 
  group_by(species) %>% 
  mutate(height = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>% 
  ungroup()

As you can see from here, height is filled with the average only with Human as species从这里可以看出, height仅以人类为物种填充平均值

library(dplyr)

starwars %>% 
  group_by(species) %>% 
  mutate(newheight = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>% 
  ungroup() %>% 
  select(species, height, newheight) %>% 
  filter(is.na(height))

#> # A tibble: 6 x 3
#>   species height newheight
#>   <chr>    <int>     <dbl>
#> 1 Human       NA      177.
#> 2 Human       NA      177.
#> 3 Human       NA      177.
#> 4 Human       NA      177.
#> 5 Droid       NA       NA 
#> 6 NA          NA       NA 

In this specific example, you need to transform height into a double because it's an integer , and, since if_else is type-consistent and from the mean you receive a double , you need to transform height accordingly.在这个具体的例子,你需要变换heightdouble ,因为它是一个integer ,并且,由于if_else的类型是一致的,并从mean收到一个double ,你需要变换height相应。

If I understand you correctly, you just want to replace NAs by group means?如果我理解正确,您只是想通过组方式替换 NA?

This should do:这应该做:

data(starwars)

head(starwars)

#This shows one missing value (NAs) for "Droid"
starwars %>%
  group_by(species) %>%
  summarize(M = mean(height, na.rm=T),
            NAs = sum(is.na(height)))

#Replace NAs by group-wise means
starwars <- starwars %>%
  group_by(species) %>%
  mutate(height = if_else(is.na(height), mean(height, na.rm=T), as.double(height) )) %>%
  ungroup()

#Now no missing value any more and means (M) remains the same
starwars %>%
  group_by(species) %>%
  summarize(M = mean(height, na.rm=T),
            NAs = sum(is.na(height)))

I would use case_when and replace_na , which was designed for these NA-replacing operations.我会使用case_whenreplace_na ,它们是为这些 NA 替换操作而设计的。

output<-starwars %>% 
    mutate(height = case_when(species=='Human' ~ replace_na(height, mean(height, na.rm=TRUE))))

If we are interested in Humans only, we do not need to group_by .如果我们只对 Humans 感兴趣,则不需要group_by If we want this transformation for every group, we could use如果我们希望对每个组进行这种转换,我们可以使用

output<-starwars %>% 
        group_by(species) %>%
        mutate(height = replace_na(height, mean(height, na.rm=TRUE)))

We can also use the zoo package with na.aggregate :我们还可以使用带有na.aggregate的 zoo 包:

library(zoo)

output<-starwars %>% 
    mutate(height = case_when(species=='Human' ~ na.aggregate(height, na.rm=TRUE)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM