[英]How to run a function dynamically from 1 to n times in R?
I have a data set named master
that contains survey data, structured like:我有一个名为
master
的数据集,其中包含调查数据,其结构如下:
Pid state msr_01 foot_01 msr_02 foot_02 … msr_n foot_n
I want to have n data sets fetched from master
, like:我想从
master
获取 n 个数据集,例如:
out_01 contains: Pid state msr_01 foot_01 msrid
out_01 包含:Pid state msr_01 foot_01 msrid
out_02 contains: Pid state msr_02 foot_02 msridout_02 包含:Pid state msr_02 foot_02 msrid
out_n contains: Pid state msr_n foot_n msridout_n 包含:Pid state msr_n foot_n msrid
The function below does this:下面的 function 执行此操作:
gen_wkds <- function (df, pno, st, col1, col2,newcol, newvalue){
colnames <- c(pno, st, col1, col2)
new_df <- df[, c(colnames)]
colnames( new_df)[3] <- "Rate"
colnames( new_df)[4] <- "Footnote"
new_df[[newcol]] <- newvalue
return(new_df)
}
How can I run this function dynamically from 1 to n times and generate n data sets?如何动态运行这个 function 1 到 n 次并生成 n 个数据集?
This question deserves an answer or two, actually:实际上,这个问题值得一两个答案:
The OP has not provided a reproducible example, so we are using a made-up dataset OP 没有提供可重现的示例,因此我们使用的是虚构的数据集
master
Pid state msr_01 foot_01 msr_02 foot_02 msr_03 foot_03 1 1 OK 11 A1 21 B1 31 C1 2 2 OK 12 A2 22 B2 32 C2
The function gen_wkds()
can be called multiple times using lapply()
可以使用 lapply(
gen_wkds()
多次调用lapply()
lmaster <- lapply(1:3, function(x)
gen_wkds(master, "Pid", "state", sprintf("msr_%02i", x), sprintf("foot_%02i", x), "msrid", x))
which creates a list of dataframes它创建了一个数据框列表
lmaster
[[1]] Pid state Rate Footnote msrid 1 1 OK 11 A1 1 2 2 OK 12 A2 1 [[2]] Pid state Rate Footnote msrid 1 1 OK 21 B1 2 2 2 OK 22 B2 2 [[3]] Pid state Rate Footnote msrid 1 1 OK 31 C1 3 2 2 OK 32 C2 3
The list elements can be named by列表元素可以命名为
names(lmaster) <- sprintf("out_%02i", seq_along(lmaster))
so lmaster
becomes所以
lmaster
变成
$out_01 Pid state Rate Footnote msrid 1 1 OK 11 A1 1 2 2 OK 12 A2 1 $out_02 Pid state Rate Footnote msrid 1 1 OK 21 B1 2 2 2 OK 22 B2 2 $out_03 Pid state Rate Footnote msrid 1 1 OK 31 C1 3 2 2 OK 32 C2 3
Note that sprintf()
is used with the %02i
format specifier in order to create the names (2 digits padded with leading zero).请注意,
sprintf()
与%02i
格式说明符一起使用以创建名称(用前导零填充的 2 位数字)。
Normally, we would stop here because storing a bunch of datasets of the same structure in a list makes it easier to apply subsequent processing steps.通常,我们会在这里停下来,因为将一堆相同结构的数据集存储在一个列表中可以更容易地应用后续处理步骤。
However, the OP has requested to generate n data sets .但是,OP 已请求生成 n 个数据集。 This can be achieved by
这可以通过
list2env(lmaster, envir = globalenv())
Again, this is not recommended as it clutters the workspace with n
separate objects as can be seen here:同样,不建议这样做,因为它会使工作区与
n
单独的对象混淆,如下所示:
ls()
[1] "gen_wkds" "lmaster" "master" "out_01" "out_02" "out_03"
(here, we only have 3 separate datasets but imagine n == 100
...) (在这里,我们只有 3 个单独的数据集,但想象
n == 100
...)
Thanks to OP's explanation of the background of the question it is clear that the primary intent is to reshape the data from wide to long format.感谢 OP 对问题背景的解释,很明显,主要目的是将数据从宽格式重塑为长格式。 This is a common operation in data wrangling.
这是数据整理中的常见操作。 So, several tools are available, eg:
因此,有几种工具可用,例如:
reshape()
from base R reshape()
从基础 Rpivot_longer()
from the tidyr
packagetidyr
package 的pivot_longer()
melt()
from the data.table
packagedata.table
package 的melt()
My preferred option is melt()
as it is straightforward to use, IMHO.我的首选选项是
melt()
,因为它易于使用,恕我直言。
library(data.table)
long <- melt(setDT(master), id.vars = c("Pid", "state"), measure.vars = patterns("msr", "foot"),
variable.name = "msrid", value.name = c("Rate", "Footnote"))
long
Pid state msrid Rate Footnote 1: 1 OK 1 11 A1 2: 2 OK 1 12 A2 3: 1 OK 2 21 B1 4: 2 OK 2 22 B2 5: 1 OK 3 31 C1 6: 2 OK 3 32 C2
Here, the reshaped data are kept in one date object long
which makes it easier to apply subsequent processing steps programmatically.在这里,重新整形的数据被保存在一个日期
long
中,这使得以编程方式应用后续处理步骤变得更加容易。 The subsets can by identified and selected by the value in the msrid
column.子集可以通过
msrid
列中的值来识别和选择。
For the sake of completeness, long
can be turned into separate objects as well by ( not recommended):为了完整起见,
long
也可以通过(不推荐)转换为单独的对象:
library(magrittr) # piping used for readabilty
split(long, by = "msrid") %>%
set_names(sprintf("out_%02i", seq_along(.))) %>%
list2env(envir = globalenv())
master <- data.frame(
Pid = 1:2, state = "OK",
msr_01 = 10 + 1:2, foot_01 = paste0("A", 1:2),
msr_02 = 20 + 1:2, foot_02 = paste0("B", 1:2),
msr_03 = 30 + 1:2, foot_03 = paste0("C", 1:2)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.