I want to make a simple Ruby program that can read the contents of a single html page and output two pieces of info into an array.
For instance, this is the webpage: http://www.trulia.com/real_estate/Cambridge-Massachusetts/
I want my output to be:
output = [ [Mid-Cambridge, $642,126],
[North Cambridge, $602,100,]
[East Cambridge, $611,436]
[Neighborhood Nine, $1,068,284]
[West Cambridge, $1,577,444] ]
I was thinking of doing something like:
File.read(filename).include?(each_neighborhood)
And from there, push each neighborhood and the price nearest to it in the html file into an array together, rinse and repeat. But I feel like this might not be the most efficient method, and I am not sure how to achieve it either.
I also heard that the gem 'search_in_file' could be useful. But it may not be necessary.
您可能想看看Nokogiri ,当您需要使用网页并从中提取信息时,它是一个很棒的宝石。
Here's a little script that does it:
#!/usr/bin/env ruby
require 'nokogiri'
require 'open-uri'
url = "http://www.trulia.com/real_estate/Cambridge-Massachusetts/"
web_page = open(url).read
doc = Nokogiri::HTML.parse( web_page )
neighborhoods = doc.css('#most_popular td.txtL').map(&:text)
listing_prices = doc.css('#most_popular td.txtC').map(&:text)
output = neighborhoods.zip(listing_prices)
puts output.inspect
The output looks something like this
[["Mid-Cambridge", "$642,126"],
["North Cambridge", "$602,100"],
["East Cambridge", "$611,436"],
["Neighborhood Nine", "$1,068,284"],
["West Cambridge", "$1,577,444"]]
Pretty much what you're looking for, right?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.