简体   繁体   English

在 R 中结合使用 if_else() function 和 group_by、all() 和 is.na()

[英]Use of if_else() function in conjunction with group_by, all() and is.na() in R

I searched the forum for a problem similar to mine, but could not find something matching exactly that problem.我在论坛中搜索了与我的问题类似的问题,但找不到与该问题完全匹配的问题。

I have an R dataframe with a grouping column and columns containing values, such as doubles and date.我有一个 R dataframe 有一个分组列和包含值的列,例如双精度和日期。 What I want to do is to write a function that groups the dataframe and create a new column that (1) if the value column contains only na, returns na or (2) if the value column contains at least one non-na, return, say, the maximum.我想要做的是编写一个 function 对 dataframe 进行分组并创建一个新列(1)如果值列仅包含 na,则返回 na 或(2)如果值列包含至少一个非 na,则返回,比如说最大值。 I have attempted the following:我尝试了以下方法:

library(dplyr)
a <- c("A", "A", "B", "B", "C", "C")
b <- c(1,2,NA,NA,NA,6)
c <- as.Date(c("2021-01-01", "2021-01-02", NA,
           NA, NA, "2021-01-06"))
df <- data.frame("Group" = a, "Value" = b, "Date" = c)

take_max <- function(data, group, value, new_col_name, fun) {
  data %>% group_by({{ group }}) %>% 
    mutate({{ new_col_name }} := if_else(
      all(is.na({{ value }})),
      fun(NA),
      max({{ value }}, na.rm = TRUE)
    ))
}

df %>% take_max(Group, Date, min_max, fun = as.Date)
df %>% take_max(Group, Value, min_max, fun = as.numeric)

It seems to work, but I get the following warning它似乎有效,但我收到以下警告

Warnmeldungen:
1: Problem with `mutate()` input `new_col`.
i kein nicht-fehlendes Argument für max; gebe -Inf zurück
i Input `new_col` is `if_else(all(is.na(Value)), fun(NA), max(Value, na.rm = TRUE))`.
i The error occurred in group 2: Group = "B". 
2: In max(~Value, na.rm = TRUE) :
  kein nicht-fehlendes Argument für max; gebe -Inf zurück

My understanding of the problem is that in group B if_else tests if max({{ value }}, na.rm = TRUE) (which in this case is equivalent to max(c()) ), would also be of the same type as fun(NA) and therefore evaluates both options.我对这个问题的理解是,在 B 组if_else测试中,如果max({{ value }}, na.rm = TRUE) (在这种情况下相当于max(c()) ),也将属于同一类型作为fun(NA)并因此评估这两个选项。 I tried to replace if_else with ifelse , but then the Date type is not preserved.我试图用if_else替换ifelse ,但是没有保留 Date 类型。

Would anyone have an idea of how to handle that?有人知道如何处理吗?

Try this:尝试这个:

take_max <- function(data, group, value, new_col_name){
  data %>% 
    group_by({{group}}) %>% 
    mutate({{new_col_name}} := if(all(is.na({{value}}))) NA else max({{value}}, na.rm = TRUE))
}

take_max(df, Group, Value, min_max)
take_max(df, Group, Date, min_max)

If you don't want multiple records per group, you can replace mutate with summarise .如果您不希望每个组有多个记录,可以将mutate替换为summarise

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM