简体   繁体   中英

How to add quotes to every 2nd word in a string in R

I want to add double quotes around every second word in this single string.

From this

gene_id ENSG00000081237; gene_version 20; transcript_id ENST00000442510; transcript_version 8; 
gene_type protein_coding; gene_name CD45A;

to this

gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; 
gene_type "protein_coding"; gene_name "CD45A";

I have been looking through tidyverse and stringr but have not yet found good way to do this.

Thanks!

Here's a way to split the string apart, add the quotes to every other item, and paste it back together.

x = "gene_id ENSG00000081237; gene_version 20; transcript_id ENST00000442510; transcript_version 8; gene_type protein_coding; gene_name CD45A;"
x = unlist(strsplit(x, " "))
evens = seq(2, length(x), by = 2)
x[evens] = paste0('"', x[evens])
x[evens] = sub(';', '";', x[evens], fixed = TRUE)
x = paste(x, collapse = " ")
cat(x)
# gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; gene_type "protein_coding"; gene_name "CD45A";

Here is a base R approach.

First remove the ; at the end of the string, then split the vector of gene information by ; , then split again by empty space " " and save to a new vector vec_apply .

After that, paste back the unmodified split strings together with the modified strings (the strings that have new double quotes).

Note that in the console, double quotes will be preceded with backslash \ to "escape" the double quote. But after you have saved the vector to a text file, the backslash will be gone.

vec <- c("gene_id ENSG00000081237; gene_version 20; transcript_id ENST00000442510; transcript_version 8; gene_type protein_coding; gene_name CD45A;")

vec <- gsub(";$", "", vec)

vec_apply <- str_split_fixed(vec, "; ", n = str_count(string = vec, pattern = ";") + 1) %>% 
  strsplit(split = " ")

paste(sapply(vec_apply, `[[`, 1), 
      sapply(vec_apply, function(x) paste0(shQuote(x[[2]], type = "cmd"), ";")), collapse = " ")

Output in console

"gene_id \"ENSG00000081237\"; gene_version \"20\"; transcript_id \"ENST00000442510\"; transcript_version \"8\"; gene_type \"protein_coding\"; gene_name \"CD45A\";"

Output in text file

gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; gene_type "protein_coding"; gene_name "CD45A";

Or as suggested by @GregorThomas in another answer, use cat() to view the output to check if double quotes are added successfully.

cat(paste(sapply(vec_apply, `[[`, 1), 
          sapply(vec_apply, function(x) paste0(shQuote(x[[2]], type = "cmd"), ";")), collapse = " "))

gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; gene_type "protein_coding"; gene_name "CD45A";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM