R string and subset

Question

I have a long html string with

Length - 1
Class and Mode - character

......uygdasd class="vip" title="Click this link to access The Big Bang Theory: The Complete Fourth Season (DVD, 2011, 3-Disc Set).....

is it possible to extract a part of that string based on text in it. Subtract everything from class="vip" title="Click this link to access to (DVD, 2011 , as a result to get this

The Big Bang Theory: The Complete Fourth Season

Thank for a help.

Answer 1

Use grouping operators () . This throws away anything up to the "link to access " and after the "DVD," and only keeps the match for the second group. The expression .+ means <anything, of any length> . See the ?regex help page for further details about the interpretation of "^" and "$" and the use of \\\\N in replacements:

 htxt <- 'uygdasd class="vip" title="Click this link to access The Big Bang Theory: The Complete Fourth Season (DVD, 2011, 3-Disc Set).....'

gsub(pattern= "^(.+link to access )(.+)( \\(DVD,.+$)", "\\2", htxt)
[1] "The Big Bang Theory: The Complete Fourth Season"

There is, of course, the famous, highly-voted response to this question: RegEx match open tags except XHTML self-contained tags

R string and subset

Question

1 answers

solution1
2 ACCPTED 2015-04-29 20:28:00

R string and subset

Question

1 answers

solution1 2 ACCPTED 2015-04-29 20:28:00

solution1
2 ACCPTED 2015-04-29 20:28:00