I am trying to compute average scores for responses to different events. My data is in long format with one row for each event, sample dataset data
here:
Subject Event R1 R2 R3 R4 Average
1 A 1 2 2 N/A 2.5
1 B 1 1 1 1 1
So to get the average for event A, it would be (R1 + R2 + R3)/3 ignoring the N/A, whereas event B has 4 responses. I computed the average for Event A in dplyr
as:
data$average <- data%>%filter(Event == "A") %>% with(data, (R1 + R2 + R3)/4)
I ran into problems when I tried to do the same for the next event...Thank you for the help!
The following doesn't include the NA value as part of the mean calculation ( na.rm=TRUE ). Also, I think grouping by Event is important. When run without group_by, the calculations combine all events and the resulting value is 1.285714 (=9/7 obs).
data <- data.frame(
Subject=c(1,1),
Event=c('A', 'B'),
R1=c(1,1),
R2=c(2,1),
R3=c(2,1),
R4=c(NA,1)
)
df <- data %>%
group_by(Event) %>%
mutate(Average = mean(c(R1,R2,R3,R4), na.rm=TRUE))
Output:
Subject Event R1 R2 R3 R4 Average
<dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 1 2 2 NA 1.67
2 1 B 1 1 1 1 1
You don't need to filter for each event at a time. dplyr
is able to process all rows at once, one by one. Also when using dplyr
, you don't need to assign to a variable outside of its context, such as data$average <- (something)
. You can use mutate()
. So the intuitive syntax for dplyr
would be:
data <-
data %>%
mutate(average = mean(c(R1, R2, R3, R4), na.rm = TRUE))
You can use rowMeans
to calculate means for each row of a dataframe. Specify in the input which columns you want to include. To ignore the NA
set na.rm=TRUE
.
data$Average <- rowMeans(data[,c("R1", "R2", "R3", "R4")], na.rm=TRUE)
If you had lots of columns to average and didn't want to type them all out, you could use grep
to match the names of data
to any pattern. Say for example you want to average all the rows containing an "R" in their name:
data$Average <- rowMeans(data[,grep("R",names(data))], na.rm=TRUE)
Just to complete all previous answers, if you have multiple values named R1
, R2
, .... R100
, instead of writing all of them into the mean
function, you could be interested by reshaping your dataframe into a longer format using pivot_longer
function and then group by Event and calculate the mean. Finally, using pivot_wider
, you could get your dataframe into the initial wider format.
library(dplyr)
library(tidyr)
df %>% mutate_at(vars(contains("R")), as.numeric) %>%
pivot_longer(cols = starts_with("R"), names_to = "R", values_to = "Values") %>%
group_by(Event) %>%
mutate(average = mean(Values, na.rm = TRUE)) %>%
pivot_wider(names_from = R, values_from = Values)
# A tibble: 2 x 8
# Groups: Event [2]
Subject Event Average average R1 R2 R3 R4
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 2.5 1.67 1 2 2 NA
2 1 B 1 1 1 1 1 1
As mentioned by @TTS, there is something wrong in your calculation of the average of the event A.
Reproducible example
structure(list(Subject = c(1L, 1L), Event = c("A", "B"), R1 = c(1L,
1L), R2 = 2:1, R3 = 2:1, R4 = c("N/A", "1"), Average = c(2.5,
1)), row.names = c(NA, -2L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x5555743c1310>)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.