简体   繁体   中英

Ignore Max Value in Mean Calculation in R

I have the following sample df from 20m sprint testing athletes with split times. They do 3 trials. I want to create new columns for each split that average their two fastest trials (drop the slowest trial).

Here is a sample of the df:

    Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
1 Athlete 1   2.005    1.320   3.325   1.904    1.306   3.210   1.993    1.316   3.309
2 Athlete 2   1.967    1.383   3.350   1.931    1.391   3.322   2.005    1.399   3.404
3 Athlete 3   2.008    1.381   3.389   2.074    1.365   3.439   2.047    1.408   3.455
4 Athlete 4   1.817    1.286   3.103   1.924    1.285   3.209      NA       NA      NA

The end result would be 3 new columns with the mean values of the 2 fastest trials (based on the 0_20m time) ("Avg_0_10m", "Avg_10_20m", Avg_0_20m"). Ideally the solution is robust enough to handle NA values as there will be some within the dataset.

Any suggestions on how to approach this? I'm not sure how to be able to filter out the slowest 0_20m trial with the related split times and average the other trials.

library(tidyverse)

x <- read.table(text=" Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
'Athlete 1'   2.005    1.320   3.325   1.904    1.306   3.210   1.993    1.316   3.309
'Athlete 2'   1.967    1.383   3.350   1.931    1.391   3.322   2.005    1.399   3.404
'Athlete 3'   2.008    1.381   3.389   2.074    1.365   3.439   2.047    1.408   3.455
'Athlete 4'  1.817    1.286   3.103   1.924    1.285   3.209      NA       NA      NA", header=TRUE, check.names=FALSE)


x %>%
  gather(trial,time,-Athlete) %>%
  separate(trial, sep = "(?<=m)_", into = c("trial_time", "trial_try")) %>%
  group_by(Athlete, trial_time) %>%
  group_split() %>%
  purrr::map(function(x) {
    x %>%
      arrange(time) %>%
      group_by(Athlete, trial_time) %>%
      summarise(time_avg = mean(time[1:2], na.rm = TRUE))
  }) %>%
  bind_rows() %>%
  spread(trial_time, time_avg)

First to create the data.frame.

x <- read.table(text="x Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
1 Athlete 1   2.005    1.320   3.325   1.904    1.306   3.210   1.993    1.316   3.309
2 Athlete 2   1.967    1.383   3.350   1.931    1.391   3.322   2.005    1.399   3.404
3 Athlete 3   2.008    1.381   3.389   2.074    1.365   3.439   2.047    1.408   3.455
4 Athlete 4   1.817    1.286   3.103   1.924    1.285   3.209      NA       NA      NA", header=T, check.names=F)


x %>% select(-x) %>% 
   gather("split", "time", -Athlete) %>% 
   mutate(split = gsub("_\\d$","", split)) %>% 
   group_by(Athlete, split) %>% 
   arrange(time) %>% 
   slice(1:2) %>% 
   summarize(Avg = mean(time))
# A tibble: 12 x 3
# Groups:   Athlete [4]
#   Athlete split    Avg
#     <int> <chr>  <dbl>
# 1       1 0_10m   1.95
# 2       1 0_20m   3.26
# 3       1 10_20m  1.31
# 4       2 0_10m   1.95
# 5       2 0_20m   3.34
# 6       2 10_20m  1.39
# 7       3 0_10m   2.03
# 8       3 0_20m   3.41
# 9       3 10_20m  1.37
#10       4 0_10m   1.87
#11       4 0_20m   3.16
#12       4 10_20m  1.29

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM