简体   繁体   中英

R - Automatically find string matches in multiple R scripts

This is a bit of a weird questions. And I am not sure how to best word it, please bare with me.

Background:
We have a shiny app which uses shiny.i18n package to translate the app into several languages. We have multiple developers working on this app and sometimes they do not enter the text which should be translated into the translate.json file, which means someone has to go through the whole app scripts to check if everything is in the json file.. Our app currently contains 384 R scripts total and going through them all takes days. This is really a massive app..

Problem
I am hoping to automate this task somehow. Ideally I would like to read in a list of all the R scripts, eg using list.files(...) :

r_scripts <- list.files(
    path = "/path/to/scripts",
    pattern = ".R",
    recursive = TRUE,
    full.names = TRUE
)

Then takes this list of R scripts, and read each one and add it to a vector. Eg

code_vctr <- as.character()

for(i in 1:length(r_scripts)){

    code_vctr <- cat(
      code_vctr,
      readLines(
        r_scripts[i]
      )
   )
}

Then, after I have somehow concatenated the scripts together in one massive vector, I need some way to be able to search for text which is withing translate()$t(...) . For example, shiny.i18n uses the function translate()$t() to translate what is between the brackets to the language of user choice. So if in the code it reads: `translate()$t("This should be translated"), then the text string looks within the translate.json file for a matching string: "This should be translated", and then changes it to whatever the other language's string would be eg French: "Cela devrait être traduit".

How can I then search for this text string which would be inbetween the brackets of translate()$t(...) ? An example of such code would be:

    infoBox(
      translate()$t(
        "Error"
      ),
      subtitle = translate()$t(
        "Failed to get this code to work"
      ),
      icon = icon(
        "thumbs-down",
        lib = "glyphicon"
      ),
      fill = TRUE,
      color = "red"
    )
  )

but could also contain a paste0 function for longer strings. However, just being able to get the text between the brackets, regardless of whether paste0 is included would be super helpful.

    infoBox(
      translate()$t(
        "Warning"
      ),
      subtitle = translate()$t(
        paste0(
          "This is a very very long text string",
          "it continues on, but already just being ",
          "able to the text inbetween the translate ",
          "brackets, regardless of whether it contains ",
          "paste0 or not, would still be super helpful."
        ),
        icon = icon(
          "thumbs-down",
          lib = "glyphicon"
        ),
        fill = TRUE,
        color = "red"
      )
    )

Ideally, I would like to get a dataframe containing all the text which I can use to search for matches in the translate.json file to see which ones are missing..

Please note, the code example I have above does not really work all that well. I cannot seem to get a very good working example...

Any advice would be GREATLY appreciated! Thank you in advance.

I believe what you want to do can be achieved in 4 steps:

  1. Duplicates the files
r_scripts <- list.files(
    path = "/path/to/scripts",
    pattern = ".R",
    recursive = TRUE,
    full.names = TRUE
)
#duplicate files in a new folder (first create the new folder)

new.folder <- 'H:(insert location here)'

file.copy(r_scripts, new.folder)
  1. Turn these duplicates into text files so they can be read into R easily.
#Make new file names
new_r_scripts <- sub(pattern="\\.R$", replacement=".txt", x=r_scripts)

#before renaming files, you'll probably have to paste that file location onto the names (use paste0)

# rename files
file.rename(from = r_scripts, to = new_r_scripts)
  1. Read all the text files into R.
scripts <- lapply(new_r_scripts, function(x)readChar(x, file.info(x)$size))

#turn the list given from the above function into a vector
script_vector <- unlist(scripts)
  1. Find the strings you are looking for using a regular expression with a look ahead and a look behind. This uses the package stringr
stringr::str_extract_all(script_vector, '(?<=(translate\\(\\)\\$t\\()[[:alpha:]]+(?=\\)')

Note the use of escape characters which are necessary because of some of the characters in your string are metacharacters for regex.

This is really just an approximation of the answer. I'd bet good money I made a mistake here somewhere, but I found this an interesting question. Good luck, and if you run into an issue, feel free to ask about it and I'll see what I can do.

R studio has a built-in functionality of Navigating Code in the RStudio IDE, searching a "keyword" across multiple files and as well replace them. May be this will help you to reduce some manual changing

Detailed description is provided in the R Studio blog 在此处输入图片说明

Source: https://support.rstudio.com/hc/en-us/articles/200710523-Navigating-Code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM