I have a data frame made up of absorption spectra from multiple sample runs (sample a, b, c, d), where Ydata is wavelength and Xdata is absorption. I am calculating a baseline corrected absorption by subtracting the average absorption over a quiet wavelength range away from peaks of interest.
simplified dataframe:
DF <- data.frame(
group = rep(c("a", "b", "c", "d"),each=10),
Ydata = rep(1:10, times = 4),
Xdata = c(seq(1,10,1),seq(5,50,5),seq(20,11,-1),seq(0.3,3,0.3)),
abscorr = NA
)
I need to correct each sample run by subtracting the mean of a subsetted wavelength range within the run. I've been doing it this way:
for (i in 1:length(levels(DF$group))){
sub1 <- subset(DF, group == levels(DF$group)[i], select = c(group, Ydata,
Xdata));
sub2 <- subset(sub1, Ydata > 4 & Ydata < 8, select = c(group, Ydata,
Xdata));
sub1$abscorr <- sub1$Xdata - mean(sub2$Xdata);
DF <- rbind(sub1, DF);
}
and then tidy up all the 'NA's
DF <- na.omit(DF)
The way done above is obviously clunky with use of loops. Is there a better way to go about this task for a large dataset? perhaps dplyr?
Try dplyr
:
DF %>%
group_by(group) %>%
mutate(abscorr = Xdata - mean(Xdata[Ydata < 8 & Ydata > 4]))
I believe this will do it.
fun <- function(x){
x$Xdata - mean(x[which(x$Ydata > 4 & x$Ydata < 8), "Xdata"])
}
DF$abscorr <- do.call(c, lapply(split(DF, DF$group), fun))
Note that when I tested it, all.equal
gave me a series of differences, namely the attributes of the two results are different. So I ran the following.
fun <- function(x){
x$Xdata - mean(x[which(x$Ydata > 4 & x$Ydata < 8), "Xdata"])
}
DF2 <- DF
DF2$abscorr <- do.call(c, lapply(split(DF2, DF2$group), fun))
all.equal(DF[order(DF$group, DF$Ydata), ], DF2)
# [1] "Attributes: < Names: 1 string mismatch >"
# [2] "Attributes: < Length mismatch: comparison on first 2 components >"
# [3] "Attributes: < Component 2: names for target but not for current >"
# [4] "Attributes: < Component 2: Attributes: < Modes: list, NULL > >"
# [5] "Attributes: < Component 2: Attributes: < Lengths: 1, 0 > >"
# [6] "Attributes: < Component 2: Attributes: < names for target but not for current > >"
# [7] "Attributes: < Component 2: Attributes: < current is not list-like > >"
# [8] "Attributes: < Component 2: target is omit, current is numeric >"
# [9] "Component “abscorr”: Modes: numeric, logical"
#[10] "Component “abscorr”: target is numeric, current is logical"
As you can see there is no difference in the calculated values of abscorr
, only in the attributes. Among those, there are differences in the na.omit
attribute or the rownames
. I wouldn't worry if I were you, since the values of abscorr
are equal.
EDIT.
Note that if I sort DF
and then set the problem attributes to NULL
both all.equal
and the much more strict identical
return TRUE
.
DF1 <- DF[order(DF$group, DF$Ydata), ] # Modify a copy, keep the original
row.names(DF1) <- NULL
attr(DF1, "na.action") <- NULL
all.equal(DF1, DF2)
#[1] TRUE
identical(DF1, DF2)
#[1] TRUE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.