EDIT:
I would like to place a \\n
before a specific unknown word in my text. I know that the first time the unknown word appears in my text will be between "Tree" and "Lake"
Ex. of text:
text
[1] "TreeRULakeSunWater"
[2] "A B C D"
EDIT:
"Tree" and "Lake" will never change, but the word in between them is always changing so I do not look for "RU" in my regex
What I am currently doing:
if (grepl(".*Tree\\s*|Lake.*", text)) { text <- gsub(".*Tree\\s*|Lake.*", "\n\\1", text)}
The problem with what I am doing above is that the gsub
will sub all of text
and leave just \\nRU
.
text
[1] "\nRU"
I have also tried:
if (grepl(".*Tree *(.*?) *Lake.*", text)) { text <- gsub(".*Tree *(.*?) *Lake.*", "\n\\1", text)}
What I would like text
to look like after gsub
:
text
[1] "Tree \nRU LakeSunWater"
[2] "A B C D"
EDIT:
From Wiktor Stribizew's comment I am able to do a successful gsub
gsub("Tree(\\w+)Lake", "Tree \n\\1 Lake", text)
But this will only do a gsub on occurrences where "RU" is between "Tree and "Lake", which is the first occurrence of the unknown word. The unknown word and in this case "RU" will show up many times in the text, and I would like to place \\n
in front of every occurrence of "RU" when "RU" is a whole word.
New Ex.of text.
text
[1] "TreeRULakeSunWater"
[2] "A B C RU D"
New Ex.of what I would like:
text
[1] "Tree \nRU LakeSunWater"
[2] "A B C \nRU D"
Any help will be appreciated. Please let me know if further information is needed.
You need to find the unknown word between "Tree" and "Lake" first. You can use
unknown_word <- gsub(".*Tree(\\w+)Lake.*", "\\1", text)
The pattern matches any characters up to the last Tree
in a string, then captures the unknown word ( \\w+
= one or more word characters) up to the Lake
and then matches the rest of the string. It replaces all the strings in the vector. You can access the first one by [[1]]
index.
Then, when you know the word, replace it with
gsub(paste0("[[:space:]]*(", unknown_word[[1]], ")[[:space:]]*"), " \n\\1 ", text)
See IDEONE demo .
Here, you have [[:space:]]*(
+ unknown_word[ 1 ] + )[[:space:]]*
pattern. It matches zero or more whitespaces on both ends of the unknown word, and the unknown word itself (captured into Group 1). In the replacement, the spaces are shrunk into 1 (or added if there were none) and then \\\\1
restores the unknown word. You may replace [[:space:]]
with \\\\s
.
UPDATE
If you need to only add a newline symbols before RU
that are whole words, use the \\b
word boundary:
> gsub(paste0("[[:space:]]*\\b(", unknown_word[[1]], ")\\b[[:space:]]*"), " \n\\1 ", text)
[1] "TreeRULakeSunWater" "A B C \nRU D"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.