简体   繁体   English

在 R 中通过命令行传递多个 arguments

[英]Passing multiple arguments via command line in R

I am trying to pass multiple file path arguments via command line to an Rscript which can then be processed using an arguments parser.我正在尝试通过命令行将多个文件路径 arguments 传递给 Rscript,然后可以使用 arguments 解析器对其进行处理。 Ultimately I would want something like this最终我想要这样的东西

Rscript test.R --inputfiles fileA.txt fileB.txt fileC.txt --printvar yes --size 10 --anotheroption helloworld -- etc...

passed through the command line and have the result as an array in R when parsed通过命令行并在解析时将结果作为 R 中的数组

args$inputfiles =  "fileA.txt", "fileB.txt", "fileC.txt"

I have tried several parsers including optparse and getopt but neither of them seem to support this functionality.我已经尝试了几个解析器,包括 optparse 和 getopt 但它们似乎都不支持这个功能。 I know argparse does but it is currently not available for R version 2.15.2我知道 argparse 可以,但它目前不适用于 R 版本 2.15.2

Any ideas?有任何想法吗?

Thanks谢谢

Although it wasn't released on CRAN when this question was asked a beta version of the argparse module is up there now which can do this. 虽然在问这个问题时它没有在CRAN上发布,但argparse模块的测试版现在已经可以实现了。 It is basically a wrapper around the popular python module of the same name so you need to have a recent version of python installed to use it. 它基本上是一个相同名称的流行python模块的包装器,因此您需要安装最新版本的python才能使用它。 See install notes for more info. 有关详细信息,请参阅安装说明 The basic example included sums an arbitrarily long list of numbers which should not be hard to modify so you can grab an arbitrarily long list of input files. 基本示例包括一个任意长的数字列表,这些数字应该不难修改,因此您可以获取任意长的输入文件列表。

> install.packages("argparse")
> library("argparse")
> example("ArgumentParser")

In the front of your script test.R, you put this : 在脚本test.R的前面,你把它放在:

args <- commandArgs(trailingOnly = TRUE)

hh <- paste(unlist(args),collapse=' ')
listoptions <- unlist(strsplit(hh,'--'))[-1]
options.args <- sapply(listoptions,function(x){
         unlist(strsplit(x, ' '))[-1]
        })
options.names <- sapply(listoptions,function(x){
  option <-  unlist(strsplit(x, ' '))[1]
})
names(options.args) <- unlist(options.names)
print(options.args)

to get : 要得到 :

$inputfiles
[1] "fileA.txt" "fileB.txt" "fileC.txt"

$printvar
[1] "yes"

$size
[1] "10"

$anotheroption
[1] "helloworld"

After searching around, and avoiding to write a new package from the bottom up, I figured the best way to input multiple arguments using the package optparse is to separate input files by a character which is most likely illegal to be included in a file name (for example, a colon) 在搜索并避免从下到上编写新包之后,我认为使用包optparse输入多个参数的最佳方法是将输入文件分隔为一个字符,该字符最有可能被包含在文件名中(例如,冒号)

Rscript test.R --inputfiles fileA.txt:fileB.txt:fileC.txt etc...

File names can also have spaces in them as long as the spaces are escaped (optparse will take care of this) 只要空格被转义,文件名中也可以有空格(optparse会处理这个)

Rscript test.R --inputfiles file\ A.txt:file\ B.txt:fileC.txt etc...

Ultimatley, it would be nice to have a package (possibly a modified version of optparse) that would support multiple arguments like mentioned in the question and below Ultimatley,有一个软件包(可能是optparse的修改版本)会很好,它会支持多个参数,如问题及以下所述

Rscript test.R --inputfiles fileA.txt fileB.txt fileC.txt

One would think such trivial features would be implemented into a widely used package such as optparse 人们会认为这些微不足道的功能将被实现到广泛使用的包中,例如optparse

Cheers 干杯

@agstudy's solution does not work properly if input arguments are lists of the same length. 如果输入参数是相同长度的列表,则@ agstudy的解决方案无法正常工作。 By default, sapply will collapse inputs of the same length into a matrix rather than a list. 默认情况下,sapply会将相同长度的输入折叠为矩阵而不是列表。 The fix is simple enough, just explicitly set simplify to false in the sapply parsing the arguments. 修复很简单,只需在解析参数的sapply中将simplify简化为false。

args <- commandArgs(trailingOnly = TRUE)

hh <- paste(unlist(args),collapse=' ')
listoptions <- unlist(strsplit(hh,'--'))[-1]
options.args <- sapply(listoptions,function(x){
         unlist(strsplit(x, ' '))[-1]
        }, simplify=FALSE)
options.names <- sapply(listoptions,function(x){
  option <-  unlist(strsplit(x, ' '))[1]
})
names(options.args) <- unlist(options.names)
print(options.args)

The way you describe command line options is different from the way that most people would expect them to be used. 您描述命令行选项的方式与大多数人期望使用它们的方式不同。 Normally, a command line option would take a single parameter, and parameters without a preceding option are passed as arguments. 通常,命令行选项将采用单个参数,而没有先前选项的参数将作为参数传递。 If an argument would take multiple items (like a list of files), I would suggest parsing the string using strsplit(). 如果一个参数需要多个项目(比如文件列表),我建议使用strsplit()解析字符串。

Here's an example using optparse: 以下是使用optparse的示例:

library (optparse)
option_list <- list ( make_option (c("-f","--filelist"),default="blah.txt", 
                                   help="comma separated list of files (default %default)")
                     )

parser <-OptionParser(option_list=option_list)
arguments <- parse_args (parser, positional_arguments=TRUE)
opt <- arguments$options
args <- arguments$args

myfilelist <- strsplit(opt$filelist, ",")

print (myfilelist)
print (args)

Here are several example runs: 以下是几个示例运行:

$ Rscript blah.r -h
Usage: blah.r [options]


Options:
    -f FILELIST, --filelist=FILELIST
        comma separated list of files (default blah.txt)

    -h, --help
        Show this help message and exit


$ Rscript blah.r -f hello.txt
[[1]]
[1] "hello.txt"

character(0)
$ Rscript blah.r -f hello.txt world.txt
[[1]]
[1] "hello.txt"

[1] "world.txt"
$ Rscript blah.r -f hello.txt,world.txt another_argument and_another
[[1]]
[1] "hello.txt" "world.txt"

[1] "another_argument" "and_another"
$ Rscript blah.r an_argument -f hello.txt,world.txt,blah another_argument and_another
[[1]]
[1] "hello.txt" "world.txt" "blah"     

[1] "an_argument"      "another_argument" "and_another"     

Note that for the strsplit, you can use a regular expression to determine the delimiter. 请注意,对于strsplit,您可以使用正则表达式来确定分隔符。 I would suggest something like the following, which would let you use commas or colons to separate your list: 我会建议类似下面的内容,它可以让你使用逗号或冒号分隔你的列表:

myfilelist <- strsplit (opt$filelist,"[,:]")

I had this same issue, and the workaround that I developed is to adjust the input command line arguments before they are fed to the optparse parser, by concatenating whitespace-delimited input file names together using an alternative delimiter such as a "pipe" character, which is unlikely to be used as part of a file name.我遇到了同样的问题,我开发的解决方法是在将输入命令行 arguments 馈送到optparse解析器之前调整输入命令行,方法是使用替代分隔符(例如“管道”字符)将空格分隔的输入文件名连接在一起,这不太可能用作文件名的一部分。

The adjustment is then reversed at the end again, by removing the delimiter using str_split() .然后通过使用str_split()删除分隔符,最后再次反转调整。

Here is some example code:这是一些示例代码:

#!/usr/bin/env Rscript

library(optparse)
library(stringr)

# ---- Part 1: Helper Functions ----

# Function to collapse multiple input arguments into a single string 
# delimited by the "pipe" character
insert_delimiter <- function(rawarg) {
  # Identify index locations of arguments with "-" as the very first
  # character.  These are presumed to be flags.  Prepend with a "dummy"
  # index of 0, which we'll use in the index step calculation below.
  flagloc <- c(0, which(str_detect(rawarg, '^-')))
  # Additionally, append a second dummy index at the end of the real ones.
  n <- length(flagloc)
  flagloc[n+1] <- length(rawarg) + 1
  
  concatarg <- c()
  
  # Counter over the output command line arguments, with multiple input
  # command line arguments concatenated together into a single string as
  # necessary
  ii <- 1
  # Counter over the flag index locations
  for(ij in seq(1,length(flagloc)-1)) {
    # Calculate the index step size between consecutive pairs of flags
    step <- flagloc[ij+1]-flagloc[ij]
    # Case 1: empty flag with no arguments
    if (step == 1) {
      # Ignore dummy index at beginning
      if (ij != 1) {
        concatarg[ii] <- rawarg[flagloc[ij]]
        ii <- ii + 1
      }
    }
    # Case 2: standard flag with one argument
    else if (step == 2) {
      concatarg[ii] <- rawarg[flagloc[ij]]
      concatarg[ii+1] <- rawarg[flagloc[ij]+1]
      ii <- ii + 2
    }
    # Case 3: flag with multiple whitespace delimited arguments (not
    # currently handled correctly by optparse)
    else if (step > 2) {
      concatarg[ii] <- rawarg[flagloc[ij]]
      # Concatenate multiple arguments using the "pipe" character as a delimiter
      concatarg[ii+1] <- paste0(rawarg[(flagloc[ij]+1):(flagloc[ij+1]-1)],
                                collapse='|')
      ii <- ii + 2
    }
  }
  
  return(concatarg)
}

# Function to remove "pipe" character and re-expand parsed options into an
# output list again
remove_delimiter <- function(rawopt) {
  outopt <- list()
  for(nm in names(rawopt)) {
    if (typeof(rawopt[[nm]]) == "character") {
      outopt[[nm]] <- unlist(str_split(rawopt[[nm]], '\\|'))
    } else {
      outopt[[nm]] <- rawopt[[nm]]
    }
  }
  
  return(outopt)
}

# ---- Part 2: Example Usage ----

# Prepare list of allowed options for parser, in standard fashion
option_list <- list(
  make_option(c('-i', '--inputfiles'), type='character', dest='fnames',
              help='Space separated list of file names', metavar='INPUTFILES'),
  make_option(c('-p', '--printvar'), type='character', dest='pvar',
              help='Valid options are "yes" or "no"',
              metavar='PRINTVAR'),
  make_option(c('-s', '--size'), type='integer', dest='sz',
              help='Integer size value',
              metavar='SIZE')
)

# This is the customary pattern that optparse would use to parse command line
# arguments, however it chokes when there are multiple whitespace-delimited
# options included after the "-i" or "--inputfiles" flag.
#opt <- parse_args(OptionParser(option_list=option_list),
#                  args=commandArgs(trailingOnly = TRUE))

# This works correctly
opt <- remove_delimiter(parse_args(OptionParser(option_list=option_list),
                        args=insert_delimiter(commandArgs(trailingOnly = TRUE))))

print(opt)

Assuming the above file were named fix_optparse.R , here is the output result:假设上述文件名为fix_optparse.R ,这是 output 结果:

> chmod +x fix_optparse.R 
> ./fix_optparse.R --help
Usage: ./fix_optparse.R [options]


Options:
    -i INPUTFILES, --inputfiles=INPUTFILES
        Space separated list of file names

    -p PRINTVAR, --printvar=PRINTVAR
        Valid options are "yes" or "no"

    -s SIZE, --size=SIZE
        Integer size value

    -h, --help
        Show this help message and exit


> ./fix_optparse.R --inputfiles fileA.txt fileB.txt fileC.txt --printvar yes --size 10
$fnames
[1] "fileA.txt" "fileB.txt" "fileC.txt"

$pvar
[1] "yes"

$sz
[1] 10

$help
[1] FALSE

>

A minor limitation with this approach is that if any of the other arguments have the potential to accept a "pipe" character as a valid input, then those arguments will not be treated correctly.这种方法的一个小限制是,如果任何其他 arguments 有可能接受“管道”字符作为有效输入,那么这些 arguments 将不会被正确处理。 However I think you could probably develop a slightly more sophisticated version of this solution to handle that case correctly as well.但是,我认为您可能会开发一个稍微复杂一点的版本来正确处理这种情况。 This simple version works most of the time, and illustrates the general idea.这个简单的版本大部分时间都有效,并说明了总体思路。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM