I am trying to merge a vector 'means' to a dataframe. My dataframe looks like this Data = growth
I first calculated all the means for the different groups (1 group = population + temperature + size + replicat) using this command:
means<-aggregate(TL ~ Population + Temperature + Replicat + Size + Measurement, data=growth, list=growth$Name, mean)
Then, I selected the means for Measurement 1 as follows as I am only interested in these means.
meansT0<-means[which(means$Measurement=="1"),]
Now, I would like to merge this vector of means values to my dataframe (=growth) so that the right mean of each group corresponds to the right part of the dataframe.
The goal is to then substrat the mean of each group (at Measurement 1) to each element of the dataframe based on its belonging group (and for all other Measurements except Measurement 1). Maybe there is no need to add the means column to the dataframe? Do you know any command to do that ?
[27.06.18] I made up this simplified dataframe, I hope this help understanding. So, what I want is to substrat, for each individual in the dataframe and for each measurement (here only Measurement 1 and Measurement 2, normally I have more), the mean of its belongig group at MEASUREMENT 1.
So, if I get the means by group ( 1 group = Population + Temperature + Measurement):
means<-aggregate(TL ~ Population + Temperature + Measurement, data=growth, list=growth$Name, mean)
means
I got these values of means (in this example) :
Population Temperature Measurement TL
JUB 15 **1** **12.00000**
JUB 20 **1** **15.66667**
JUB 15 2 17.66667
JUB 20 2 18.66667
JUB 15 3 23.66667
JUB 20 3 24.33333
We are only interested by the means at MEASUREMENT 1. For each individual in the dataframe, I want to substrat the mean of its belonging group at Measurement 1: in this example (see dataframe with R command): - for the group JUB+15+Measurement 1 , mean = 12 - for the group JUB+20+Measurement 1 , mean = 15.66
growth<-data.frame(Population=c("JUB", "JUB", "JUB","JUB", "JUB", "JUB","JUB", "JUB", "JUB","JUB", "JUB", "JUB","JUB", "JUB", "JUB","JUB", "JUB", "JUB"), Measurement=c("1","1","1","1","1","1","2","2","2","2","2","2", "3", "3", "3", "3", "3", "3"),Temperature=c("15","15","15","20", "20", "20","15","15","15","20", "20", "20","15","15","15","20", "20", "20"),TL=c(11,12,13,15,18,14, 16,17,20,21,19,16, 25,22,24,26,24,23), New_TL=c("11-12", "12-12", "13-12", "15-15.66", "18-15.66", "14-15.66", "16-12", "17-12", "20-12", "21-15.66", "19-15.66", "16-15.66", "25-12", "22-12", "24-12", "26-15.66", "24-15.66", "23-15.66"))
print(growth)
I hope with this, you can understand better what I am trying to do. I have a lot of data and if I have to do this manually, this will take me a lot of time and increase the risk of me putting mistakes.
Here is an option with tidyverse
. After grouping by the group columns, use mutate_at
specifying the columns of interest and get the difference of that column ( .
) with the mean
of it.
library(tidyverse)
growth %>%
group_by(Population, Temperature, Replicat, Size, Measurement) %>%
mutate_at(vars(HL, TL), funs(MeanGroupDiff = .
- mean(.[Measurement == 1])))
Using a reproducible example with mtcars
dataset
data(mtcars)
mtcars %>%
group_by(cyl, vs) %>%
mutate_at(vars(mpg, disp), funs(MeanGroupDiff = .- mean(.[am==1])))
Have you considered using the data.table
package? It is very well suited for doing these kind of grouping, filtering, joining, and aggregation operations you describe, and might save you a great deal of time in the long run.
The code below shows how a workflow similar to the one you described but based on the built in mtcars
data set might look using data.table
.
To be clear, there are also ways to do what you describe using base R
as well as other packages like dplyr
, just throwing out a suggestion based on what I have found the most useful for my personal work.
library(data.table)
## Convert mtcars to a data.table
## only include columns `mpg`, `cyl`, `am` and `gear` for brevity
DT <- as.data.table(mtcars)[, .(mpg, cyl,am, gear)]
## Take a subset where `cyl` is equal to 6
DT <- DT[cyl == 6]
## Calculate grouped mean based on `gear` and `am` as grouping variables
DT[,group_mpg_avg := mean(mpg), keyby = .(gear, am)]
## Calculate each row's difference from the group mean
DT[,mpg_diff_from_group := mpg - group_mpg_avg]
print(DT)
# mpg cyl am gear group_mpg_avg mpg_diff_from_group
# 1: 21.4 6 0 3 19.75 1.65
# 2: 18.1 6 0 3 19.75 -1.65
# 3: 19.2 6 0 4 18.50 0.70
# 4: 17.8 6 0 4 18.50 -0.70
# 5: 21.0 6 1 4 21.00 0.00
# 6: 21.0 6 1 4 21.00 0.00
# 7: 19.7 6 1 5 19.70 0.00
Consider by
to subset your data frame by factors (but leave out Measurement in order to compare group 1 and all other groups). Then, run an ifelse
conditional logic calculation for needed columns. Since by
will return a list of data frames, bind all outside with do.call()
:
df_list <- by(growth, growth[,c("Population", "Temperature")], function(sub) {
# TL CORRECTION
sub$Correct_TL <- ifelse(sub$Measurement != 1,
sub$TL - mean(subset(sub, Measurement == 1)$TL),
sub$TL)
# ADD OTHER CORRECTIONS
return(sub)
})
final_df <- do.call(rbind, df_list)
Output (using posted data)
final_df
# Population Measurement Temperature TL New_TL Correct_TL
# 1 JUB 1 15 11 11-12 11.0000000
# 2 JUB 1 15 12 12-12 12.0000000
# 3 JUB 1 15 13 13-12 13.0000000
# 7 JUB 2 15 16 16-12 4.0000000
# 8 JUB 2 15 17 17-12 5.0000000
# 9 JUB 2 15 20 20-12 8.0000000
# 13 JUB 3 15 25 25-12 13.0000000
# 14 JUB 3 15 22 22-12 10.0000000
# 15 JUB 3 15 24 24-12 12.0000000
# 4 JUB 1 20 15 15-15.66 15.0000000
# 5 JUB 1 20 18 18-15.66 18.0000000
# 6 JUB 1 20 14 14-15.66 14.0000000
# 10 JUB 2 20 21 21-15.66 5.3333333
# 11 JUB 2 20 19 19-15.66 3.3333333
# 12 JUB 2 20 16 16-15.66 0.3333333
# 16 JUB 3 20 26 26-15.66 10.3333333
# 17 JUB 3 20 24 24-15.66 8.3333333
# 18 JUB 3 20 23 23-15.66 7.3333333
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.