简体   繁体   English

汇总数字列,返回非数字的最后一个值

[英]Summarise numeric columns, return last value of non-numeric

It's not uncommon to want to summarise numeric columns of a dataframe or tibble, while doing something else to non-numeric columns.想要汇总 dataframe 或 tibble 的数字列,同时对非数字列执行其他操作的情况并不少见。

There is a nice trick for this here , but it seems to fail for character columns.这里有一个很好的技巧,但对于字符列似乎失败了。

First, here it is working, returning the mean of the numeric columns and the value of the first row of the other columns首先,它在这里起作用,返回数字列的平均值和其他列第一行的值

set.seed(1234)
category <- (c('A','A','E','E','B','B','C'))
date <- seq(as.Date("2017-01-01"), by = "month", length.out = 7)
value1 <- sample(seq(from = 91, to = 97, by = 1))
dt <- data.frame(category, date, value1)
dt<- as_tibble(dt)
#works
dt2<- dt %>%
  group_by(category) %>%
  summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))
print(dt2)

Note that because the date column is non-numeric, it returns the value in the last row instead of the mean:请注意,因为日期列是非数字的,所以它返回最后一行的值而不是平均值:

# A tibble: 4 x 3
  category date       value1
  <fct>    <date>      <dbl>
1 A        2017-02-01   92.5
2 B        2017-06-01   93.5
3 C        2017-07-01   97  
4 E        2017-04-01   94.5

However, it fails when one of the columns is chr但是,当其中一列是 chr 时,它会失败

marsupial <-c("quoll","phascogale",'triok','opossum','antechinus','bandicoot','Fat-tailed dunnart')
dt$marsupial <- marsupial
dt3<- dt %>% #doesn't work
  group_by(category) %>%
  summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))
print(dt3)

Giving these errors:给出这些错误:

Error in summarise_impl(.data, dots) : 
  Evaluation error: `false` must be type double, not character.
In addition: Warning message:
In mean.default(marsupial) :
  argument is not numeric or logical: returning NA

I assume the 'false' must be type double refers to the marsupial column resulting in attempt to evaluate last .我假设'false' must be type double ,指的是导致尝试评估last的有袋动物列。 If so, why must it be double, and is there another way?如果是这样,为什么必须是双倍的,还有另一种方法吗? I wouldn't expect this from a conventional if/else conditional.我不希望这是传统的 if/else 条件语句。

ifelse seems to be the problem so i have created a function. ifelse 似乎是问题,所以我创建了一个函数。 I have updated my answer.我已经更新了我的答案。 i have tested it on the dates attributes and it seems to work on the list as well.我已经在日期属性上对其进行了测试,它似乎也适用于列表。 I hope it'll solve your problem:我希望它能解决你的问题:

dt %>% group_by(category) %>%
  summarise_all(function(x){
  if(is.numeric(x)){
    return(mean(x))
  }else{
    nth(x,-1)
  }
}
)

As of 2021, this is the current syntax:截至 2021 年,这是当前语法:

dt %>%
  group_by(category) %>%
  summarise(across(is.numeric, mean),
            across(where(~ !is.numeric(.)), last))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM