简体   繁体   中英

Averages of different lengths in R

I am trying to compute average scores for responses to different events. My data is in long format with one row for each event, sample dataset data here:

Subject  Event   R1  R2 R3 R4   Average
1        A       1   2  2  N/A   2.5
1        B       1   1  1  1     1

So to get the average for event A, it would be (R1 + R2 + R3)/3 ignoring the N/A, whereas event B has 4 responses. I computed the average for Event A in dplyr as:

data$average <- data%>%filter(Event == "A") %>% with(data, (R1 + R2 + R3)/4) 

I ran into problems when I tried to do the same for the next event...Thank you for the help!

The following doesn't include the NA value as part of the mean calculation ( na.rm=TRUE ). Also, I think grouping by Event is important. When run without group_by, the calculations combine all events and the resulting value is 1.285714 (=9/7 obs).

data <- data.frame(
  Subject=c(1,1),
  Event=c('A', 'B'),
  R1=c(1,1),
  R2=c(2,1),
  R3=c(2,1),
  R4=c(NA,1)
)

df <- data %>%
  group_by(Event) %>%
  mutate(Average = mean(c(R1,R2,R3,R4), na.rm=TRUE))

Output:

Subject Event    R1    R2    R3    R4 Average
    <dbl> <fct> <dbl> <dbl> <dbl> <dbl>   <dbl>
1       1 A         1     2     2    NA    1.67
2       1 B         1     1     1     1    1   

You don't need to filter for each event at a time. dplyr is able to process all rows at once, one by one. Also when using dplyr , you don't need to assign to a variable outside of its context, such as data$average <- (something) . You can use mutate() . So the intuitive syntax for dplyr would be:

data <-
  data %>%
  mutate(average = mean(c(R1, R2, R3, R4), na.rm = TRUE))

You can use rowMeans to calculate means for each row of a dataframe. Specify in the input which columns you want to include. To ignore the NA set na.rm=TRUE .

data$Average <- rowMeans(data[,c("R1", "R2", "R3", "R4")], na.rm=TRUE)

If you had lots of columns to average and didn't want to type them all out, you could use grep to match the names of data to any pattern. Say for example you want to average all the rows containing an "R" in their name:

data$Average <- rowMeans(data[,grep("R",names(data))], na.rm=TRUE)

Just to complete all previous answers, if you have multiple values named R1 , R2 , .... R100 , instead of writing all of them into the mean function, you could be interested by reshaping your dataframe into a longer format using pivot_longer function and then group by Event and calculate the mean. Finally, using pivot_wider , you could get your dataframe into the initial wider format.

library(dplyr)
library(tidyr)
df %>% mutate_at(vars(contains("R")), as.numeric) %>%
    pivot_longer(cols = starts_with("R"), names_to = "R", values_to = "Values") %>%
    group_by(Event) %>%
    mutate(average = mean(Values, na.rm = TRUE)) %>%
    pivot_wider(names_from = R, values_from = Values)

# A tibble: 2 x 8
# Groups:   Event [2]
  Subject Event Average average    R1    R2    R3    R4
    <int> <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
1       1 A         2.5    1.67     1     2     2    NA
2       1 B         1      1        1     1     1     1

As mentioned by @TTS, there is something wrong in your calculation of the average of the event A.

Reproducible example

structure(list(Subject = c(1L, 1L), Event = c("A", "B"), R1 = c(1L, 
1L), R2 = 2:1, R3 = 2:1, R4 = c("N/A", "1"), Average = c(2.5, 
1)), row.names = c(NA, -2L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x5555743c1310>)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM