Ignore Max Value in Mean Calculation in R

Question

I have the following sample df from 20m sprint testing athletes with split times. They do 3 trials. I want to create new columns for each split that average their two fastest trials (drop the slowest trial).

Here is a sample of the df:

    Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
1 Athlete 1   2.005    1.320   3.325   1.904    1.306   3.210   1.993    1.316   3.309
2 Athlete 2   1.967    1.383   3.350   1.931    1.391   3.322   2.005    1.399   3.404
3 Athlete 3   2.008    1.381   3.389   2.074    1.365   3.439   2.047    1.408   3.455
4 Athlete 4   1.817    1.286   3.103   1.924    1.285   3.209      NA       NA      NA

The end result would be 3 new columns with the mean values of the 2 fastest trials (based on the 0_20m time) ("Avg_0_10m", "Avg_10_20m", Avg_0_20m"). Ideally the solution is robust enough to handle NA values as there will be some within the dataset.

Any suggestions on how to approach this? I'm not sure how to be able to filter out the slowest 0_20m trial with the related split times and average the other trials.

Answer 1

library(tidyverse)

x <- read.table(text=" Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
'Athlete 1'   2.005    1.320   3.325   1.904    1.306   3.210   1.993    1.316   3.309
'Athlete 2'   1.967    1.383   3.350   1.931    1.391   3.322   2.005    1.399   3.404
'Athlete 3'   2.008    1.381   3.389   2.074    1.365   3.439   2.047    1.408   3.455
'Athlete 4'  1.817    1.286   3.103   1.924    1.285   3.209      NA       NA      NA", header=TRUE, check.names=FALSE)


x %>%
  gather(trial,time,-Athlete) %>%
  separate(trial, sep = "(?<=m)_", into = c("trial_time", "trial_try")) %>%
  group_by(Athlete, trial_time) %>%
  group_split() %>%
  purrr::map(function(x) {
    x %>%
      arrange(time) %>%
      group_by(Athlete, trial_time) %>%
      summarise(time_avg = mean(time[1:2], na.rm = TRUE))
  }) %>%
  bind_rows() %>%
  spread(trial_time, time_avg)

Answer 2

First to create the data.frame.

x <- read.table(text="x Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
1 Athlete 1   2.005    1.320   3.325   1.904    1.306   3.210   1.993    1.316   3.309
2 Athlete 2   1.967    1.383   3.350   1.931    1.391   3.322   2.005    1.399   3.404
3 Athlete 3   2.008    1.381   3.389   2.074    1.365   3.439   2.047    1.408   3.455
4 Athlete 4   1.817    1.286   3.103   1.924    1.285   3.209      NA       NA      NA", header=T, check.names=F)


x %>% select(-x) %>% 
   gather("split", "time", -Athlete) %>% 
   mutate(split = gsub("_\\d$","", split)) %>% 
   group_by(Athlete, split) %>% 
   arrange(time) %>% 
   slice(1:2) %>% 
   summarize(Avg = mean(time))
# A tibble: 12 x 3
# Groups:   Athlete [4]
#   Athlete split    Avg
#     <int> <chr>  <dbl>
# 1       1 0_10m   1.95
# 2       1 0_20m   3.26
# 3       1 10_20m  1.31
# 4       2 0_10m   1.95
# 5       2 0_20m   3.34
# 6       2 10_20m  1.39
# 7       3 0_10m   2.03
# 8       3 0_20m   3.41
# 9       3 10_20m  1.37
#10       4 0_10m   1.87
#11       4 0_20m   3.16
#12       4 10_20m  1.29

Ignore Max Value in Mean Calculation in R

Question

2 answers

solution1
1 ACCPTED 2020-01-15 16:43:55

solution2
0 2020-01-15 16:50:55

Ignore Max Value in Mean Calculation in R

Question

2 answers

solution1 1 ACCPTED 2020-01-15 16:43:55

solution2 0 2020-01-15 16:50:55

solution1
1 ACCPTED 2020-01-15 16:43:55

solution2
0 2020-01-15 16:50:55