在 Ruby 中使用 Nokogiri 抓取特定标题

Question

I'm currently practicing web scraping using the NYT Best Sellers website.我目前正在使用 NYT Best Sellers 网站练习网页抓取。 I want to get the title of the #1 book on the list and found the HTML element:我想获取列表中第 1 本书的标题并找到 HTML 元素：

<div class="book-body">
  <p class="freshness">12 weeks on the list</p>
  <h3 class="title" itemprop="name">CRAZY RICH ASIANS</h3>
  <p class="author" itemprop="author">by Kevin Kwan</p>
  <p itemprop="description" class="description">A New Yorker gets a surprise when she spends the summer with her boyfriend in Singapore.</p>
</div>

I'm using the following code to grab the specific text:我正在使用以下代码来获取特定文本：

doc.css(".title").text

However, it returns the titles of every book on the list.但是，它返回列表中每本书的书名。 How would I go about getting just the specific book title, "CRAZY RICH ASIANS"?我将如何获得特定的书名，“疯狂的亚洲富豪”？

Answer 1

If you look at the return from doc.css(".title") you will see it is a collection of all the titles.如果您查看doc.css(".title")的返回值，您会看到它是所有标题的集合。 As Nokogiri::XML::Element Objects作为Nokogiri::XML::Element对象

CSS to my knowledge does not have a selector for targeting the first element of a given class.据我所知，CSS 没有用于定位给定类的第一个元素的选择器。 (Someone may certainly correct me if I am wrong) but to get just the first element from a Nokogiri::XML::NodeSet is still very simple as it acts like an Array in many cases. （如果我错了，有人肯定会纠正我）但是从Nokogiri::XML::NodeSet获取第一个元素仍然非常简单，因为它在许多情况下就像一个Array 。 For Example:例如：

doc.css(".title")[0].text

You could also use xpath to select just the first one (since XPath does support index based selection) like so:您还可以使用 xpath 只选择第一个（因为 XPath 支持基于索引的选择），如下所示：

doc.xpath(doc.xpath("(//h3[@class='title'])[1]").text

Please Note:请注意：

Ruby indexes start at 0 as in the first example; Ruby 索引从 0 开始，就像第一个例子一样；
XPath indexes start at 1 as in the second example. XPath 索引从 1 开始，如第二个示例中所示。

在 Ruby 中使用 Nokogiri 抓取特定标题

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-09-07 20:35:12

在 Ruby 中使用 Nokogiri 抓取特定标题

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-09-07 20:35:12

解决方案1
1 已采纳 2018-09-07 20:35:12