I have a long html string with
Length - 1
Class and Mode - character
......uygdasd class="vip" title="Click this link to access The Big Bang Theory: The Complete Fourth Season (DVD, 2011, 3-Disc Set).....
is it possible to extract a part of that string based on text in it. Subtract everything from class="vip" title="Click this link to access
to (DVD, 2011
, as a result to get this
The Big Bang Theory: The Complete Fourth Season
Thank for a help.
Use grouping operators ()
. This throws away anything up to the "link to access " and after the "DVD," and only keeps the match for the second group. The expression .+
means <anything, of any length>
. See the ?regex
help page for further details about the interpretation of "^" and "$" and the use of \\\\N
in replacements:
htxt <- 'uygdasd class="vip" title="Click this link to access The Big Bang Theory: The Complete Fourth Season (DVD, 2011, 3-Disc Set).....'
gsub(pattern= "^(.+link to access )(.+)( \\(DVD,.+$)", "\\2", htxt)
[1] "The Big Bang Theory: The Complete Fourth Season"
There is, of course, the famous, highly-voted response to this question: RegEx match open tags except XHTML self-contained tags
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.