简体   繁体   中英

Gsub and regular expression

I have a web page. The HTML source contains this text:

<meta property="og:title" content="John"/>

John is an example, the name may vary. I am sure that og:title will appear only once in the text. This is my code:

$browser.goto( url )
x = $browser.html.gsub( /^.*<meta property="og:title" content="(.+?)".>/m, '\1' )

I expected to find the name John in my variable x The '\\1' should give me the first part I put in the parenthesis, ie (.+?), ie John, right? Also, I used a dot . to match a slash / , is there a better way?

Using Watir API:

x = browser.meta.attribute_value "content"

I was not able to access the meta element using either css and xpath .

That code will return all of the HTML, with the matching code (which is everything between the start of the string up to and including the />) replaced by 'John'. So that comes down to "John", followed by the HTML that was after the /> of that meta property.

If you only want to extract the name, and that tag occurs only once, you can use something like:

@browser.html =~ /<meta property="og:title" content="(.+?)"/
x = $1

If you only want the value of content :

html = '<meta property="og:title" content="John"/>'
=> "<meta property=\"og:title\" content=\"John\"/>"
html[/property="og:title" content="([^"]+)"/, 1]
=> "John"

If you're not familiar with regex, "([^"]+)" might throw you. It means "from the first " , grab everything until the next " . In effect it means "grab everything inside the double-quotes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM