简体   繁体   English

尝试使用Nokogiri使用自定义伪类CSS选择器提取属性值

[英]Trying to extract attribute values using Nokogiri with custom pseudoclass CSS selectors

Having loaded a (X)HTML page, I'm trying to get the value of a meta tag's "content" attribute. 加载(X)HTML页面后,我试图获取元标记的“内容”属性的值。 For example, given: 例如,给定:

<meta name="author" content="John Smith" />

I'd like to extract the value "John Smith". 我想提取价值“约翰史密斯”。

I know how to do that using XPath and understand that CSS was meant primarily for element selection but Nokogiri supports defining custom CSS pseudoclasses which I thought could be used as follows: 我知道如何使用XPath,并了解CSS主要用于元素选择,但Nokogiri支持定义自定义CSS伪类 ,我认为可以使用如下:

class CSSext
  def attr(nodeset, tag)
    nodeset.first.attribute_nodes.find_all {|node| node.name == tag}
  end
end

doc = Nokogiri::HTML(open(someurl))
doc.css("meta[name='name']:attr('content')", CSSext.new)

However, this returns the same result as 但是,这会返回相同的结果

doc.css("meta[name='name']")

What gives? 是什么赋予了? Nokogiri uses the same engine underneath for both CSS and XPath searches so anything that's possible in XPath should be doable in CSS. Nokogiri使用相同的引擎进行CSS和XPath搜索,因此在XPath中可能的任何东西都应该在CSS中可行。 How should I go about extracting the attribute value? 我该如何提取属性值?

Why not just? 为什么不呢?

doc.at("meta[name='author']")['content']

As far as I understand, pseudoclasses can be used to filter the nodeset only, but not to replace the nodeset with some other value such as the value of one of the nodes's attribute. 据我所知,伪类只能用于过滤节点集,但不能用一些其他值替换节点集,例如其中一个节点属性的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM