使用样式属性Ruby使用Nokogiri刮擦表数据

Question

I want to scrape the text value of the td s in this webpage ex: 0.2197 British Pound 我想在此网页中抓取td的文本值，例如： 0.2197 British Pound

<table border=1 cellpadding=5 cellspacing=0 style="font-weight: normal; font-size: 10.5;"><tr><td width=50>1791</td><td>0.2195 British Pound</td></tr><tr><td width=50>1792</td><td>0.2239 British Pound</td></tr><tr><td width=50>1793</td><td>0.2218 British Pound</td></tr><tr><td width=50>1794</td><td>0.2106 British Pound</td></tr><tr><td width=50>1795</td><td>0.2209 British Pound</td></tr><tr><td width=50>1796</td><td>0.2333 British Pound</td></tr><tr><td width=50>1797</td><td>0.2254 British Pound</td></tr><tr><td width=50>1798</td><td>0.2279 British Pound</td></tr><tr><td width=50>1799</td><td>0.2420 British Pound</td></tr><tr><td width=50>1800</td><td>0.2199 British Pound</td></tr><tr><td width=50>1801</td><td>0.2283 British Pound</td></tr><tr><td width=50>1802</td><td>0.2230 British Pound</td></tr><tr><td width=50>1803</td><td>0.2202 British Pound</td></tr><tr><td width=50>1804</td><td>0.2197 British Pound</td></tr><tr><td width=50>1805</td><td>0.2300 British Pound</td></tr>

However, the webpage I am scraping has several table so I need a way to specify this particular table. 但是，我要抓取的网页有几个表，因此我需要一种方法来指定此特定表。

This is what I've tried: 这是我尝试过的：

exchange_rate_table = Nokogiri::HTML(open('http://measuringworth.com/datasets/exchangeglobal/result.php?year_source=1791&year_result=2007&countryE%5B%5D=United+Kingdom'))
        exchange_rate_table.css('td')

but that returns all of the td s, some that are even outside this table. 但这会返回所有td ，其中一些甚至不在此表之外。

Answer 1

In your solution you found all <td> elements: 在您的解决方案中，您找到了所有<td>元素：

Here, you opened the entire web page: 在这里，您打开了整个网页：

exchange_rate_table = Nokogiri::HTML(open('http://measuringworth.com/datasets/exchangeglobal/result.php?year_source=1791&year_result=2007&countryE%5B%5D=United+Kingdom'))

Here, you found all <td> elements in the web page (if that's what you want): 在这里，您找到了网页中的所有<td>元素（如果需要的话）：

exchange_rate_table.css('td')

There are two tables in the web page, one of which you'd like to exclude. 网页中有两个表格，您要排除其中之一。 In this particular page you only have two <table> elements. 在此特定页面中，您只有两个<table>元素。

Instead of finding all <td> elements, you should find one table, and then its <td> elements. 代替查找所有<td>元素，您应该找到一个表，然后是其<td>元素。

Find the web page: 查找网页：

web_page = Nokogiri::HTML(open('http://measuringworth.com/datasets/exchangeglobal/result.php?year_source=1791&year_result=2007&countryE%5B%5D=United+Kingdom'))

Find the second table (the one with the exchange rates): 找到第二个表（带有汇率的表）：

exchange_rate_table = web_page.css('table').last

Find all of the <td> elements in that table table: 在该表中找到所有<td>元素：

exchange_rate_cells = exchange_rate_table.css('td')

使用样式属性Ruby使用Nokogiri刮擦表数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-05-23 23:57:53

使用样式属性Ruby使用Nokogiri刮擦表数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-05-23 23:57:53

解决方案1
0 已采纳 2015-05-23 23:57:53