I want to perform calculations by grouped rows in a data frame in R. My go-to for this would be to spread the column and do calculations on the columns but I want to also be able to do it without reshaping my data frame. For example, I want to perform a foldchange calculation on varA and varB for each subject, dividing the 'post' timepoint by the 'pre' timepoint, to make data frame df below look like df_foldchange. I want the calculation to be a new element within the existing 'timepoint' column.
df <- data.frame(subject = c('subject1', 'subject1', 'subject2', 'subject2'),
varA = c(1, 2, 1, 3),
varB = c(2, 3, 2, 4),
timepoint = c('pre', 'post', 'pre', 'post'))
df_foldchange <- data.frame(subject = c('subject1', 'subject1', 'subject1',
'subject2', 'subject2', 'subject2'),
varA = c(1, 2, 2, 1, 3, 3),
varB = c(2, 3, 1.5, 2, 4, 2),
timepoint = c('pre', 'post', 'foldchange',
'pre', 'post', 'foldchange'))
I suspect you've mixed up your 'pre' / 'post' sequence in the construction of df
? The way you have it, you don't have a 'post' for 'subject1', or a 'pre' for 'subject2'.
You could do:
df <- data.frame(subject = c('subject1', 'subject1', 'subject2', 'subject2'),
varA = c(1, 2, 1, 3),
varB = c(2, 3, 2, 4),
timepoint = c('pre', 'post', 'pre', 'post'),
stringsAsFactors = FALSE)
df1 <- df %>%
group_by(subject) %>%
summarise(varA = varA[timepoint=='post'] / varA[timepoint=='pre'],
varB = varB[timepoint=='post'] / varB[timepoint=='pre'],
timepoint = 'foldchange')
df_foldchange <- df %>%
bind_rows(df1) %>%
arrange(subject)
# output
subject varA varB timepoint
1 subject1 1 2.0 pre
2 subject1 2 3.0 post
3 subject1 2 1.5 foldchange
4 subject2 1 2.0 pre
5 subject2 3 4.0 post
6 subject2 3 2.0 foldchange
You could sort the above to get exactly the output you want, if the order is important.
Using data.table
you could do the following:
df <- data.frame(subject = c('subject1', 'subject1', 'subject2', 'subject2'),
varA = c(1, 2, 1, 3),
varB = c(2, 3, 2, 4),
timepoint = c('pre', 'post', 'pre', 'post'))
library(data.table)
setDT(df)#converting data frame into data.table
df2<- df[,lapply(.SD, function(x) x[timepoint=="post"]/x[timepoint=="pre"]),subject, .SDcols=varA:varB] #performing computation per columns requiered
df2[,timepoint:="foldchange"] #adding variable "foldchange"
df_foldchange <- rbind(df,df2) #binding per row
df_foldchange[order(subject)]
#output
subject varA varB timepoint
1: subject1 1 2.0 pre
2: subject1 2 3.0 post
3: subject1 2 1.5 foldchange
4: subject2 1 2.0 pre
5: subject2 3 4.0 post
6: subject2 3 2.0 foldchange
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.