简体   繁体   English

如何检查R中replace()函数中的NA是否存在?

[英]How to check if NA within replace() function in R?

In my dataset, the duration of a activity is either given in hours (column duration_hours ) or in minutes (column duration_minutes ).在我的数据集中,活动的持续时间以小时(列duration_hours )或分钟(列duration_minutes )给出。 If it is given in hours, the duration_minutes column is empty ( NA ) and vice versa.如果以小时为单位,则duration_minutes列为空( NA ),反之亦然。
I now want to convert the values given in minutes into hours by dividing them by 60 (minutes).我现在想通过将它们除以 60(分钟)来将以分钟为单位给出的值转换为小时。

To do so I tried this command:为此,我尝试了以下命令:

df <- df %>% mutate(duration_recoded = replace(duration_minutes, !is.na(duration_minutes), duration_minutes / 60))

However, the command produces incorrect results and this warning message is shown:但是,该命令会产生不正确的结果并显示此警告消息:

Warning message:
In x[list] <- values :
  number of items to replace is not a multiple of replacement length

Can anybody tell me where my mistake is?谁能告诉我我的错误在哪里?

Here's some sample data:以下是一些示例数据:

df <- structure(list(duration_hours = c(1, NA, 2, NA, 1), duration_minutes = c(NA, 25, NA, 30, NA)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

We can make use of the coalesce() function from the dplyr package here:我们可以在这里使用dplyr包中的coalesce()函数:

library(dplyr)
df <- df %>% mutate(duration_recoded = coalesce(duration_hours, duration_minutes / 60))

This should work because if the duration_hours be non NA , then coalesce would simply grab it and assign it to duration_recorded .这应该有效,因为如果duration_hours不是NA ,则coalesce会简单地抓住它并将其分配给duration_recorded If duration_hours is actually NA , then it would pass and instead take duration_minutes divided by 60.如果duration_hours实际上是NA ,那么它将通过,而是将duration_minutes除以 60。

The problem in your code is that duration minutes is a vector and when you divide by 60 you are performing a vector operation.您的代码中的问题是持续时间分钟是一个向量,当您除以 60 时,您正在执行向量运算。 Let's use an example df:让我们以 df 为例:

# A tibble: 7 x 1
  duration_minutes
             <dbl>
1               10
2               20
3               30
4               NA
5               50
6               NA
7               60

In this case, df$duraction_minutes / 60 results in:在这种情况下, df$duraction_minutes / 60结果:

0.1666667 0.3333333 0.5000000        NA 0.8333333        NA 1.0000000

That means that you are trying to replace every NA value with a vector of multiple values... That is why your warning message says number of items to replace is not a multiple of replacement length .这意味着您正在尝试用多个值的向量替换每个 NA 值......这就是为什么您的警告消息说number of items to replace is not a multiple of replacement length

You either have to use some function that aggregates multiple values to a single value (such as sum() , mean() , first() , etc) or you have to select a single value to act as a replacement.您要么必须使用某个函数将多个值聚合为一个值(例如sum()mean()first()等),要么必须选择一个值作为替代。 the coalesce() function is just finding the first non-missing element. coalesce()函数只是找到第一个非缺失元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM