简体   繁体   中英

Nokogiri and XPath: saving text result of scrape

I would like to save the text results of a scrape in a file. This is my current code:

require "rubygems"
require "open-uri"
require "nokogiri"

class Scrapper
  attr_accessor :html, :single

  def initialize(url)
    download = open(url)
    @page = Nokogiri::HTML(download)
    @html = @page.xpath('//div[@class = "quoteText"andfollowing-sibling::div[1][@class = "quoteFooter" and .//a[@href and normalize-space() = "hard-work"]]]')
  end

  def get_quotes
    @quotes_array = @html.collect {|node| node.text.strip}
    @single = @quotes_array.each do |quote|
      quote.gsub(/\s{2,}/, " ") 
    end
  end
end

I know that I can write a file like this:

File.open('text.txt', 'w') do |fo|
    fo.write(content)

but I don't know how to incorporate @single which holds the results of my scrape. Ultimate goal is to insert the information into a database.

I have come across some folks using Yaml but I am finding it hard to follow the step to step guide.

Can anyone point me in the right direction?

Thank you.

Just use:

@single = @quotes_array.map do |quote|
  quote.squeeze(' ')
end

File.open('text.txt', 'w') do |fo|
  fo.puts @single
end

Or:

File.open('text.txt', 'w') do |fo|
  fo.puts @quotes_array.map{ |q| q.squeeze(' ') }
end

and don't bother creating @single .

Or:

File.open('text.txt', 'w') do |fo| 
  fo.puts @html.collect { |node| node.text.strip.squeeze(' ') }
end

and don't bother creating @single or @quotes_array .

squeeze is part of the String class. This is from the documentation :

"  now   is  the".squeeze(" ")         #=> " now is the"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM