How do I extract the word, wordofvariablelength from the string below.
<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">
I was able to get the first part of the string using the below code, but is there a regular expression I can use to get only the word immediately after "browse/" and before "\\", which here is the word, "wordofvariablelength" using the code below
mystring = substr(mystring,nchar("<a href=\"http://www.thesaurus.com/browse/")+1,nchar("<a href=\"http://www.thesaurus.com/browse/")+20)
Note that the word, wordofvariablelength could be of any length, and so I cannot hardcode and start and end
Try
sub('.*?\\.com/[^/]*\\/([a-z]+).*', '\\1', mystring)
#[1] "wordofvariablelength"
Or
library(stringr)
str_extract(mystring, perl('(?<=browse/)[A-Za-z]+'))
#[1] "wordofvariablelength"
mystring <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"
Through regmatches function.
> x <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"
> regmatches(x, regexpr('.*?"[^"]*/\\K[^/"]*(?=")', x, perl=TRUE))
[1] "wordofvariablelength"
OR
> regmatches(x, regexpr('[^/"]*(?="\\s+class=")', x, perl=TRUE))
[1] "wordofvariablelength"
OR
Much more simpler one using gsub.
> gsub('.*/|".*', "", x)
[1] "wordofvariablelength"
You can use the following regex (?<=browse/).*?(?=\\\\")
. The regex means: check if we have browse/
, then take all the subsequent characters up to (but without consuming) \\
.
Sample code (and a sample program here ):
mystr <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"
regmatches(mystr, regexpr('(?<=browse/).*?(?=\\")', mystr, perl=T))
perl=T
means we are using Perl -like regex flavor that allows using fixed-width look-behind ( (?<=browse/)
).
Output:
[1] "wordofvariablelength"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.