I want to extract strings from the sentence using regex in R. And I'm new to R and don't where to begin or how to do it?
string<-c(".\n Written by\nJ-S-Golden \n
\n \n \n Plot Summary\n |\n Plot
Synopsis\n \n \n Plot Keywords:\n wrongful
imprisonment\n |\n escape from prison\n
|\n based on the works of stephen king\n |\n
prison\n |\n voice over narration\n | See
All (296) » \n \n Taglines:\nFear can hold you
prisoner. Hope can set you free. \n \n")
I have the string and I want in output is:
Plot Keywords:
\n wrongful imprisonment\n
|\n escape from prison\n
|\n based on the works of stephen king\n
|\n prison\n
|\n voice over narration\n
| See All (296) » \n \n
I don't know how to extract clean data from the string. Can someone please help me.
Here is solution using base R's sub
function. This matches (and includes) the leading text Plot Keywords:
. Then, it uses a tempered dot to match any character until, but not including, the first following label followed by a colon.
sub("(?s).*(Plot Keywords:(?:(?![^: ]+:).)*).*", "\\1", string, perl=TRUE)
[1] "Plot Keywords:\n wrongful \nimprisonment\n
|\n escape from prison\n
\n|\n based on the works of
stephen king\n
|\n \nprison\n |\n voice over narration\n
| See \nAll (296) » \n \n "
In this particular case, a pure regex demo might be more helpful than a R demo, so here is a link to one:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.