简体   繁体   中英

Passing arguments to R script in command line (shell/bash): what to do when column names contain tilde (~)

I'm utilizing Rscript to run an R script through bash, and I want to specify arguments to be passed to functions within the script itself. Specifically, I want to pass arguments that specify:

  • path to data file ( .csv ) and
  • certain column names in that data file.

I run into a problem when the column names include the tilde sign ( ~ ). I've tried wrapping the column names with backticks but still unsuccessful.

Example

I want to write a script that takes in a data file in .csv format and plots a histogram for one variable according to the user's choice.

Here's my function:

plot_histogram <- function(path_to_input, x_var) {
  
  data_raw <- read.csv(file = path_to_input)
  
  path_to_output_folder <- dirname(path_to_input)
  
  png(filename = paste0(path_to_output_folder, "/", "output_plot.png"))
  
  hist(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", xlab = "my_var")
  
  replicate(dev.off(), n = 20)
}

Let's run it on some fake data

set.seed(123)
df <- data.frame(age = sample(20:80, size = 100, replace = TRUE))

write.csv(df, "some_age_data.csv")

plot_histogram(path_to_input = "some_age_data.csv",
               x_var = "age")

As intended, I get a .png file with the plot, saved to the same directory where the .csv is at历史

Now customize an R script to be run from command line

plot_histogram.R

args <- commandArgs(trailingOnly = TRUE)

## same function as above
plot_histogram <- function(path_to_input, x_var) {
  
  data_raw <- read.csv(file = path_to_input)
  path_to_output_folder <- dirname(path_to_input)
  png(filename = paste0(path_to_output_folder, "/", "output_plot.png"))
  hist(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", xlab = "my_var")
  replicate(dev.off(), n = 20)
}

plot_histogram(path_to_input = args[1], x_var = args[2])

Then run via command line using Rscript

$ Rscript --vanilla plot_histogram.R /../../../some_age_data.csv "age"

Works too!

However, things break if the column name contains tilde

Step 1: create fake data

library(tibble)

set.seed(123)
df <- tibble(`age-blah~value` = sample(20:80, size = 100, replace = T))

write.csv(df, "some_age_data.csv")

Step 2: Using Rscript :

$ Rscript --vanilla plot_histogram.R /../../../some_age_data.csv "age-blah~value"

Error in hist.default(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", : invalid number of 'breaks' Calls: plot_histogram -> hist -> hist.default Execution halted

Bottom Line

When using Rscript , how can I pass an argument that specifies a column name containing tilde? Alternatively, how can I work around .csv files that have such a format of tilde in column names, within the framework of Rscript ?

Thanks!

You are successfully passing an argument that specifies a column name containing tilde. However, read.csv has "fixed" the column names so it doesn't actually contain a tilde.

read.csv is silently converting the column name to age.blah.value . Use check.names = FALSE to make it age-blah~value .

data_raw <- read.csv(file = path_to_input, check.names = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM