简体   繁体   中英

regular expression in R for word of variable length between two characters

How do I extract the word, wordofvariablelength from the string below.

<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">

I was able to get the first part of the string using the below code, but is there a regular expression I can use to get only the word immediately after "browse/" and before "\\", which here is the word, "wordofvariablelength" using the code below

mystring = substr(mystring,nchar("<a href=\"http://www.thesaurus.com/browse/")+1,nchar("<a href=\"http://www.thesaurus.com/browse/")+20)

Note that the word, wordofvariablelength could be of any length, and so I cannot hardcode and start and end

Try

sub('.*?\\.com/[^/]*\\/([a-z]+).*', '\\1', mystring)
#[1] "wordofvariablelength"

Or

library(stringr)
 str_extract(mystring, perl('(?<=browse/)[A-Za-z]+'))
#[1] "wordofvariablelength"

data

mystring <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"

Through regmatches function.

> x <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"
> regmatches(x, regexpr('.*?"[^"]*/\\K[^/"]*(?=")', x, perl=TRUE))
[1] "wordofvariablelength"

OR

> regmatches(x, regexpr('[^/"]*(?="\\s+class=")', x, perl=TRUE))
[1] "wordofvariablelength"

OR

Much more simpler one using gsub.

> gsub('.*/|".*', "", x)
[1] "wordofvariablelength"

you can use this regex

/browse\/(.*?)\\/g

demo here https://regex101.com/r/gX4dC0/1

You can use the following regex (?<=browse/).*?(?=\\\\") . The regex means: check if we have browse/ , then take all the subsequent characters up to (but without consuming) \\ .

Sample code (and a sample program here ):

mystr <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"
regmatches(mystr, regexpr('(?<=browse/).*?(?=\\")', mystr, perl=T))

perl=T means we are using Perl -like regex flavor that allows using fixed-width look-behind ( (?<=browse/) ).

Output:

[1] "wordofvariablelength"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM