[英]Gsub and regular expression
I have a web page. 我有一个网页。 The HTML source contains this text:
HTML源代码包含以下文本:
<meta property="og:title" content="John"/>
John is an example, the name may vary. 约翰就是一个例子,名字可能有所不同。 I am sure that og:title will appear only once in the text.
我确信og:title在文本中只会出现一次。 This is my code:
这是我的代码:
$browser.goto( url )
x = $browser.html.gsub( /^.*<meta property="og:title" content="(.+?)".>/m, '\1' )
I expected to find the name John in my variable x The '\\1' should give me the first part I put in the parenthesis, ie (.+?), ie John, right? 我希望在变量x中找到John的名字。'\\ 1'应该给我括号中的第一部分,即(。+?),即John,对吗? Also, I used a dot .
另外,我使用了一个点。 to match a slash / , is there a better way?
匹配斜线/,是否有更好的方法?
Using Watir API: 使用Watir API:
x = browser.meta.attribute_value "content"
I was not able to access the meta
element using either css
and xpath
. 我无法使用
css
和xpath
访问meta
元素。
That code will return all of the HTML, with the matching code (which is everything between the start of the string up to and including the />) replaced by 'John'. 该代码将返回所有HTML,并将匹配的代码(从字符串开头到/>包括/>之间的所有内容)替换为'John'。 So that comes down to "John", followed by the HTML that was after the /> of that meta property.
这样就归结为“ John”,然后是该meta属性的/>之后的HTML。
If you only want to extract the name, and that tag occurs only once, you can use something like: 如果您只想提取名称,并且该标记仅出现一次,则可以使用以下方法:
@browser.html =~ /<meta property="og:title" content="(.+?)"/
x = $1
If you only want the value of content
: 如果您只想要
content
的价值:
html = '<meta property="og:title" content="John"/>'
=> "<meta property=\"og:title\" content=\"John\"/>"
html[/property="og:title" content="([^"]+)"/, 1]
=> "John"
If you're not familiar with regex, "([^"]+)"
might throw you. It means "from the first "
, grab everything until the next "
. 如果你不熟悉正则表达式,
"([^"]+)"
可能会引发你的。这意味着‘从第一"
,抓住一切直到下一个"
。 In effect it means "grab everything inside the double-quotes. 实际上,它的意思是“抓住双引号内的所有内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.