[英]R loop question- looping through a dataframe
I am new to R and need some help figuring out a problem.我是 R 的新手,需要一些帮助来解决问题。 In summary, I have a dataframe with different values: 10 rows and 6 columns.
总之,我有一个具有不同值的 dataframe:10 行和 6 列。 Each column represents a variable: column 1- n1, column2- mean1, column3- varaince1, column4- n2, column5- mean2, column6- variance2.
每列代表一个变量:列 1- n1、列 2- mean1、列 3- varaince1、列 4- n2、列 5- mean2、列 6- variance2。 Each row is a different combination of these variables.
每行都是这些变量的不同组合。 I want to iterate through each row and generate two samples- sample 1- random normal variables with n1,mean1 and sd1 (variance1 sqrt) and sample 2-random normal variables with n1,mean1 and sd1 (variance1 sqrt).
我想遍历每一行并生成两个样本 - 样本 1 - 具有 n1,mean1 和 sd1(variance1 sqrt)的随机正态变量和具有 n1,mean1 和 sd1(variance1 sqrt)的样本 2-随机正态变量。 Can someone let me know what would be the best way to proceed?
有人可以让我知道最好的方法是什么吗? Thanks for the help.
谢谢您的帮助。
Here is a sample data I have using the dput() function:这是我使用 dput() function 的示例数据:
structure(list(n1 = c(5, 10, 5, 10, 5, 10), n2 = c(3, 3, 6, 6,
3, 3), mean1 = c(4, 4, 4, 4, 6, 6), mean2 = c(15, 15, 15, 15,
15, 15), sd1 = c(1, 1, 1, 1, 1, 1), sd2 = c(10, 10, 10, 10, 10,
10)), out.attrs = list(dim = c(n1 = 2L, n2 = 2L, mean1 = 2L,
mean2 = 2L, sd1 = 2L, sd2 = 2L), dimnames = list(n1 = c("n1= 5",
"n1=10"), n2 = c("n2=3", "n2=6"), mean1 = c("mean1=4", "mean1=6"
), mean2 = c("mean2=15", "mean2=20"), sd1 = c("sd1=1", "sd1=5"
), sd2 = c("sd2=10", "sd2= 4"))), row.names = c(NA, 6L), class = "data.frame")
You can save the data generated in lists.您可以将生成的数据保存在列表中。 params is the data frame of parameters.
params是参数的数据框。
data1<-list()
data2<-list()
for(i in 1:dim(params)[1]){
data_1i<- rnorm(n= params$n1[i], mean= params$mean1[i], sd=params$sd1[i] )
data_2i<- rnorm(n= params$n2[i], mean= params$mean2[i], sd=params$sd2[i] )
data1[[i]]<- data_1i
data2[[i]]<- data_2i
}
You did not indicate how you plan to use the results.你没有说明你打算如何使用这些结果。 This will store both sets of random numbers in a list:
这会将两组随机数存储在一个列表中:
set.seed(42) # For reproducibility
results <- apply(params, 1, function(x) list(first=rnorm(x[1], x[3], x[5]),
second=rnorm(x[2], x[4], x[6])))
results[[1]]
# $first
# [1] 5.370958 3.435302 4.363128 4.632863 4.404268
#
# $second
# [1] 13.93875 30.11522 14.05341
#
results[[1]]$first
# [1] 5.370958 3.435302 4.363128 4.632863 4.404268
results[[1]]$second
# [1] 13.93875 30.11522 14.05341
If you want to use these to compute a t-test, then you can do that directly without storing the randomly generated values:如果您想使用这些来计算 t 检验,那么您可以直接执行此操作而无需存储随机生成的值:
set.seed(42)
results.t <- apply(params, 1, function(x) t.test(rnorm(x[1], x[3], x[5]),
rnorm(x[2], x[4], x[6])))
results.t[[1]]
#
# Welch Two Sample t-test
#
# data: rnorm(x[1], x[3], x[5]) and rnorm(x[2], x[4], x[6])
# t = -2.7736, df = 2.0133, p-value = 0.1083
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -37.938884 8.083236
# sample estimates:
# mean of x mean of y
# 4.441304 19.369128
Or you can use results
:或者您可以使用
results
:
results.t2 <- lapply(results, function(x) t.test(x$first, x$second))
results.t2[[1]]
#
# Welch Two Sample t-test
#
# data: x$first and x$second
# t = -2.7736, df = 2.0133, p-value = 0.1083
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -37.938884 8.083236
# sample estimates:
# mean of x mean of y
# 4.441304 19.369128
A purrr
way: purrr
的方式:
library(purrr)
library(dplyr)
df %>%
group_nest(row_number()) %>%
pull(data) %>%
map(~.x %>% tibble(first = rnorm(n = n1, mean = mean1, sd = sd1),
second = rnorm(n = n2, mean = mean2, sd = sd2)) %>%
select(first, second))
# if you want them in one df: bind_rows()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.