简体   繁体   English

如何在 R 中动态运行 function 1 到 n 次?

[英]How to run a function dynamically from 1 to n times in R?

I have a data set named master that contains survey data, structured like:我有一个名为master的数据集,其中包含调查数据,其结构如下:

Pid state msr_01 foot_01 msr_02 foot_02 … msr_n foot_n

I want to have n data sets fetched from master , like:我想从master获取 n 个数据集,例如:

out_01 contains: Pid state msr_01 foot_01 msrid out_01 包含:Pid state msr_01 foot_01 msrid
out_02 contains: Pid state msr_02 foot_02 msrid out_02 包含:Pid state msr_02 foot_02 msrid
out_n contains: Pid state msr_n foot_n msrid out_n 包含:Pid state msr_n foot_n msrid

The function below does this:下面的 function 执行此操作:

gen_wkds <- function (df, pno, st, col1, col2,newcol, newvalue){
  colnames <- c(pno, st, col1, col2)
  new_df <- df[, c(colnames)]
  colnames( new_df)[3] <- "Rate"
  colnames( new_df)[4] <- "Footnote"
  new_df[[newcol]] <- newvalue
  return(new_df)
}

How can I run this function dynamically from 1 to n times and generate n data sets?如何动态运行这个 function 1 到 n 次并生成 n 个数据集?

This question deserves an answer or two, actually:实际上,这个问题值得一两个答案:

  • The first answers OP's original question "How can I run this function dynamically from 1 to n times and generate n data sets?"第一个回答 OP 的原始问题“我怎样才能动态地运行这个 function 1 到 n 次并生成 n 个数据集?” (not recommended) (不建议)
  • The second answers the underlying question on reshaping from wide to long format with multiple value columns (recommended)第二个回答了有关使用多个值列从宽格式重塑为长格式的基本问题(推荐)

1. Run OP's function dynamically from 1 to n times 1.动态运行OP的function 1到n次

The OP has not provided a reproducible example, so we are using a made-up dataset OP 没有提供可重现的示例,因此我们使用的是虚构的数据集

master
 Pid state msr_01 foot_01 msr_02 foot_02 msr_03 foot_03 1 1 OK 11 A1 21 B1 31 C1 2 2 OK 12 A2 22 B2 32 C2

The function gen_wkds() can be called multiple times using lapply()可以使用 lapply( gen_wkds()多次调用lapply()

lmaster <- lapply(1:3, function(x) 
  gen_wkds(master, "Pid", "state", sprintf("msr_%02i", x), sprintf("foot_%02i", x), "msrid", x))

which creates a list of dataframes它创建了一个数据框列表

lmaster
 [[1]] Pid state Rate Footnote msrid 1 1 OK 11 A1 1 2 2 OK 12 A2 1 [[2]] Pid state Rate Footnote msrid 1 1 OK 21 B1 2 2 2 OK 22 B2 2 [[3]] Pid state Rate Footnote msrid 1 1 OK 31 C1 3 2 2 OK 32 C2 3

The list elements can be named by列表元素可以命名为

names(lmaster) <- sprintf("out_%02i", seq_along(lmaster))

so lmaster becomes所以lmaster变成

$out_01 Pid state Rate Footnote msrid 1 1 OK 11 A1 1 2 2 OK 12 A2 1 $out_02 Pid state Rate Footnote msrid 1 1 OK 21 B1 2 2 2 OK 22 B2 2 $out_03 Pid state Rate Footnote msrid 1 1 OK 31 C1 3 2 2 OK 32 C2 3

Note that sprintf() is used with the %02i format specifier in order to create the names (2 digits padded with leading zero).请注意, sprintf()%02i格式说明符一起使用以创建名称(用前导零填充的 2 位数字)。

Normally, we would stop here because storing a bunch of datasets of the same structure in a list makes it easier to apply subsequent processing steps.通常,我们会在这里停下来,因为将一堆相同结构的数据集存储在一个列表中可以更容易地应用后续处理步骤。

However, the OP has requested to generate n data sets .但是,OP 已请求生成 n 个数据集 This can be achieved by这可以通过

list2env(lmaster, envir = globalenv())

Again, this is not recommended as it clutters the workspace with n separate objects as can be seen here:同样,建议这样做,因为它会使工作区与n单独的对象混淆,如下所示:

ls()
 [1] "gen_wkds" "lmaster" "master" "out_01" "out_02" "out_03"

(here, we only have 3 separate datasets but imagine n == 100 ...) (在这里,我们只有 3 个单独的数据集,但想象n == 100 ...)

2. Reshaping from wide to long format with multiple value columns 2. 使用多值列从宽格式改写为长格式

Thanks to OP's explanation of the background of the question it is clear that the primary intent is to reshape the data from wide to long format.感谢 OP 对问题背景的解释,很明显,主要目的是将数据从宽格式重塑为长格式。 This is a common operation in data wrangling.这是数据整理中的常见操作。 So, several tools are available, eg:因此,有几种工具可用,例如:

  • reshape() from base R reshape()从基础 R
  • pivot_longer() from the tidyr package来自tidyr package 的pivot_longer()
  • melt() from the data.table package来自data.table package 的melt()

My preferred option is melt() as it is straightforward to use, IMHO.我的首选选项是melt() ,因为它易于使用,恕我直言。

library(data.table)
long <- melt(setDT(master), id.vars = c("Pid", "state"), measure.vars = patterns("msr", "foot"), 
             variable.name = "msrid", value.name = c("Rate", "Footnote"))
long
 Pid state msrid Rate Footnote 1: 1 OK 1 11 A1 2: 2 OK 1 12 A2 3: 1 OK 2 21 B1 4: 2 OK 2 22 B2 5: 1 OK 3 31 C1 6: 2 OK 3 32 C2

Here, the reshaped data are kept in one date object long which makes it easier to apply subsequent processing steps programmatically.在这里,重新整形的数据被保存在一个日期long中,这使得以编程方式应用后续处理步骤变得更加容易。 The subsets can by identified and selected by the value in the msrid column.子集可以通过msrid列中的值来识别和选择。

For the sake of completeness, long can be turned into separate objects as well by ( not recommended):为了完整起见, long也可以通过(推荐)转换为单独的对象:

library(magrittr) # piping used for readabilty
split(long, by = "msrid") %>% 
  set_names(sprintf("out_%02i", seq_along(.))) %>% 
  list2env(envir = globalenv())

Data数据

master <- data.frame(
  Pid = 1:2, state = "OK", 
  msr_01 = 10 + 1:2, foot_01 = paste0("A", 1:2), 
  msr_02 = 20 + 1:2, foot_02 = paste0("B", 1:2),
  msr_03 = 30 + 1:2, foot_03 = paste0("C", 1:2)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM