[英]Subtracting mean of every two columns from two dataframes R
Suppose I have two data frames as follows: 假设我有两个数据帧,如下所示:
df1 <- data.frame(ceiling(runif(10,1,10)), ceiling(runif(10,1,10)), ceiling(runif(10,1,10)))
colnames(df1) <- c("V1","V2","V3")
df2 <- data.frame(ceiling(runif(10,1,10)), ceiling(runif(10,1,10)), ceiling(runif(10,1,10)))
colnames(df2) <- c("V1","V2","V3")
Using this dummy data, I want to create a new dataframe with 1 column and 3 rows: 使用此虚拟数据,我想创建一个具有1列3行的新数据框:
V1
1 mean(df1$V1) - mean(df2$V1)
2 mean(df1$V2) - mean(df2$V2)
3 mean(df1$V3) - mean(df2$V3)
I also want to create another dataframe as follows: 我还想创建另一个数据框,如下所示:
V1
1 wilcox.test(df1$V1,df2$V1)$p.value
2 wilcox.test(df1$V2,df2$V2)$p.value
3 wilcox.test(df1$V3,df2$V3)$p.value
My real data has 54 columns, so for my data each dataframe would be of 54 rows. 我的实际数据有54列,因此对于我的数据,每个数据框将有54行。
Means: 手段:
data.frame(mean = colMeans(df1) - colMeans(df2))
# mean
# V1 1.4
# V2 2.0
# V3 1.4
P-values: P值:
data.frame(
p.value = mapply(function(x, y) wilcox.test(x, y)$p.value, df1, df2)
)
# p.value
# V1 0.32060365
# V2 0.07784363
# V3 0.21779915
Q1 Q1
data.frame(mean=sapply(df1, mean)-sapply(df2,mean))
Q2 Q2
out <- NULL
for(i in 1:ncol(df1)) out[[i]] <- wilcox.test(df1[,i], df2[,i])$p.value
data.frame(p=unlist(out))
You can do it using a vector of ones: 您可以使用一个矢量来做到这一点:
m1 = (t(df1) %*% rep(1, nrow(df1))) / nrow(df1) # Equivalent to a mean
m2 = (t(df2) %*% rep(1, nrow(df2))) / nrow(df2)
m1-m2
Here's a tidyverse
approach to create a table with info about the tests you've performed: 这是一种
tidyverse
方法来创建一个表,其中包含有关您已执行的测试的信息:
# for reproducibility
set.seed(215)
# example datasets
df1 <- data.frame(ceiling(runif(10,1,10)), ceiling(runif(10,1,10)), ceiling(runif(10,1,10)))
colnames(df1) <- c("V1","V2","V3")
df2 <- data.frame(ceiling(runif(10,1,10)), ceiling(runif(10,1,10)), ceiling(runif(10,1,10)))
colnames(df2) <- c("V1","V2","V3")
library(tidyverse)
list(df1, df2) %>% # put your dataframes in a list
map_df(data.frame, .id = "df") %>% # create a dataframe with an id value for each dataset
tbl_df() %>% # for visualisation purposes only
gather(v, x, -df) %>% # reshape data
nest(-v) %>% # nest data
mutate(w.t = map(data, ~wilcox.test(.x$x ~ .x$df)), # perfom wilcoxon test
pval = map_dbl(w.t, "p.value"), # extract p value
mean_diff = map_dbl(data, ~mean(.x$x[.x$df==1])-mean(.x$x[.x$df==2]))) # calculate mean difference
# # A tibble: 3 x 5
# v data w.t pval mean_diff
# <chr> <list> <list> <dbl> <dbl>
# 1 V1 <tibble [20 x 2]> <S3: htest> 0.730 0.600
# 2 V2 <tibble [20 x 2]> <S3: htest> 0.145 -1.8
# 3 V3 <tibble [20 x 2]> <S3: htest> 0.0295 2.8
Column v
represents your variables (initial columns). v
列代表您的变量(初始列)。
Column data
includes the variables used for the corresponding test. 列
data
包括用于相应测试的变量。
Column wt
includes the test output. wt
列包括测试输出。
Column pval
is the extracted p value from each test. pval
列是从每个测试中提取的p值。
Column mean_diff
is the mean difference. 列
mean_diff
是均值差。
If you save the above process as results
you'll be able to use results$wt
and see the test outputs 如果将上述过程另存为
results
,则可以使用results$wt
并查看测试输出
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.