简体   繁体   English

在命令行 (shell/bash) 中将参数传递给 R 脚本:当列名包含波浪号 (~) 时该怎么办

[英]Passing arguments to R script in command line (shell/bash): what to do when column names contain tilde (~)

I'm utilizing Rscript to run an R script through bash, and I want to specify arguments to be passed to functions within the script itself.我正在利用Rscript通过 bash 运行 R 脚本,并且我想指定要传递给脚本本身内函数的参数。 Specifically, I want to pass arguments that specify:具体来说,我想传递指定的参数:

  • path to data file ( .csv ) and数据文件的路径 ( .csv ) 和
  • certain column names in that data file.该数据文件中的某些列名称。

I run into a problem when the column names include the tilde sign ( ~ ).当列名包含波浪号 ( ~ ) 时,我遇到了问题。 I've tried wrapping the column names with backticks but still unsuccessful.我试过用反引号包裹列名,但仍然不成功。

Example例子

I want to write a script that takes in a data file in .csv format and plots a histogram for one variable according to the user's choice.我想编写一个脚本,它接收.csv格式的数据文件,并根据用户的选择为一个变量绘制直方图。

Here's my function:这是我的功能:

plot_histogram <- function(path_to_input, x_var) {
  
  data_raw <- read.csv(file = path_to_input)
  
  path_to_output_folder <- dirname(path_to_input)
  
  png(filename = paste0(path_to_output_folder, "/", "output_plot.png"))
  
  hist(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", xlab = "my_var")
  
  replicate(dev.off(), n = 20)
}

Let's run it on some fake data让我们在一些假数据上运行它

set.seed(123)
df <- data.frame(age = sample(20:80, size = 100, replace = TRUE))

write.csv(df, "some_age_data.csv")

plot_histogram(path_to_input = "some_age_data.csv",
               x_var = "age")

As intended, I get a .png file with the plot, saved to the same directory where the .csv is at正如预期的那样,我得到了一个带有绘图的.png文件,保存到.csv所在的同一目录中历史

Now customize an R script to be run from command line现在自定义要从命令行运行的 R 脚本

plot_histogram.R

args <- commandArgs(trailingOnly = TRUE)

## same function as above
plot_histogram <- function(path_to_input, x_var) {
  
  data_raw <- read.csv(file = path_to_input)
  path_to_output_folder <- dirname(path_to_input)
  png(filename = paste0(path_to_output_folder, "/", "output_plot.png"))
  hist(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", xlab = "my_var")
  replicate(dev.off(), n = 20)
}

plot_histogram(path_to_input = args[1], x_var = args[2])

Then run via command line using Rscript然后使用Rscript通过命令行运行

$ Rscript --vanilla plot_histogram.R /../../../some_age_data.csv "age"

Works too!也有效!

However, things break if the column name contains tilde但是,如果列名包含波浪号,事情就会中断

Step 1: create fake data第 1 步:创建假数据

library(tibble)

set.seed(123)
df <- tibble(`age-blah~value` = sample(20:80, size = 100, replace = T))

write.csv(df, "some_age_data.csv")

Step 2: Using Rscript :第 2 步:使用Rscript

$ Rscript --vanilla plot_histogram.R /../../../some_age_data.csv "age-blah~value"

Error in hist.default(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", : invalid number of 'breaks' Calls: plot_histogram -> hist -> hist.default Execution halted hist.default(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", : 'breaks' 调用次数无效:plot_histogram -> hist -> hist.default 执行停止

Bottom Line底线

When using Rscript , how can I pass an argument that specifies a column name containing tilde?使用Rscript ,如何传递指定包含波浪号的列名的参数? Alternatively, how can I work around .csv files that have such a format of tilde in column names, within the framework of Rscript ?或者,如何在Rscript的框架内处理列名中具有这种波浪号格式的.csv文件?

Thanks!谢谢!

You are successfully passing an argument that specifies a column name containing tilde.正在成功传递一个参数,该参数指定包含波浪号的列名。 However, read.csv has "fixed" the column names so it doesn't actually contain a tilde.但是, read.csv已经“固定”了列名,因此它实际上并不包含波浪号。

read.csv is silently converting the column name to age.blah.value . read.csv正在默默地将列名转换为age.blah.value Use check.names = FALSE to make it age-blah~value .使用check.names = FALSE使其成为age-blah~value

data_raw <- read.csv(file = path_to_input, check.names = FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM