简体   繁体   中英

How to extract string from sentence using regex in R?

I want to extract strings from the sentence using regex in R. And I'm new to R and don't where to begin or how to do it?

string<-c(".\n                Written by\nJ-S-Golden            \n        
\n        \n         \n                Plot Summary\n    |\n        Plot 
Synopsis\n    \n        \n            Plot Keywords:\n wrongful 
imprisonment\n                        |\n escape from prison\n                        
|\n based on the works of stephen king\n                        |\n 
prison\n                        |\n voice over narration\n            | See 
All (296) »      \n        \n            Taglines:\nFear can hold you 
prisoner. Hope can set you free.        \n        \n")

I have the string and I want in output is:

Plot Keywords:
\n wrongful imprisonment\n
|\n escape from prison\n
|\n based on the works of stephen king\n                        
|\n prison\n                        
|\n voice over narration\n            
| See All (296) »      \n        \n

I don't know how to extract clean data from the string. Can someone please help me.

Here is solution using base R's sub function. This matches (and includes) the leading text Plot Keywords: . Then, it uses a tempered dot to match any character until, but not including, the first following label followed by a colon.

sub("(?s).*(Plot Keywords:(?:(?![^: ]+:).)*).*", "\\1", string, perl=TRUE)

[1] "Plot Keywords:\n wrongful \nimprisonment\n
                    |\n escape from prison\n
                    \n|\n based on the works of
     stephen king\n
                    |\n \nprison\n                        |\n voice over narration\n
        | See \nAll (296) »      \n        \n            "

In this particular case, a pure regex demo might be more helpful than a R demo, so here is a link to one:

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM