简体   繁体   English

Gsub和正则表达式

[英]Gsub and regular expression

I have a web page. 我有一个网页。 The HTML source contains this text: HTML源代码包含以下文本:

<meta property="og:title" content="John"/>

John is an example, the name may vary. 约翰就是一个例子,名字可能有所不同。 I am sure that og:title will appear only once in the text. 我确信og:title在文本中只会出现一次。 This is my code: 这是我的代码:

$browser.goto( url )
x = $browser.html.gsub( /^.*<meta property="og:title" content="(.+?)".>/m, '\1' )

I expected to find the name John in my variable x The '\\1' should give me the first part I put in the parenthesis, ie (.+?), ie John, right? 我希望在变量x中找到John的名字。'\\ 1'应该给我括号中的第一部分,即(。+?),即John,对吗? Also, I used a dot . 另外,我使用了一个点。 to match a slash / , is there a better way? 匹配斜线/,是否有更好的方法?

Using Watir API: 使用Watir API:

x = browser.meta.attribute_value "content"

I was not able to access the meta element using either css and xpath . 我无法使用cssxpath访问meta元素。

That code will return all of the HTML, with the matching code (which is everything between the start of the string up to and including the />) replaced by 'John'. 该代码将返回所有HTML,并将匹配的代码(从字符串开头到/>包括/>之间的所有内容)替换为'John'。 So that comes down to "John", followed by the HTML that was after the /> of that meta property. 这样就归结为“ John”,然后是该meta属性的/>之后的HTML。

If you only want to extract the name, and that tag occurs only once, you can use something like: 如果您只想提取名称,并且该标记仅出现一次,则可以使用以下方法:

@browser.html =~ /<meta property="og:title" content="(.+?)"/
x = $1

If you only want the value of content : 如果您只想要content的价值:

html = '<meta property="og:title" content="John"/>'
=> "<meta property=\"og:title\" content=\"John\"/>"
html[/property="og:title" content="([^"]+)"/, 1]
=> "John"

If you're not familiar with regex, "([^"]+)" might throw you. It means "from the first " , grab everything until the next " . 如果你不熟悉正则表达式, "([^"]+)"可能会引发你的。这意味着‘从第一" ,抓住一切直到下一个" In effect it means "grab everything inside the double-quotes. 实际上,它的意思是“抓住双引号内的所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM