简体   繁体   English

如何使用Nokogiri抓取网站并将数据输出到xml文件?

[英]How do I scrape a website and output data to xml file with Nokogiri?

I've been trying to scrape data using Nokogiri and HTTParty and can scrape data off a website successfully and print it to the console but I can't work out how to output the data to an xml file in the repo. 我一直在尝试使用Nokogiri和HTTParty抓取数据,并且可以成功地从网站上抓取数据并将其打印到控制台,但是我不知道如何在回购中将数据输出到xml文件。

Right now the code looks like this: 现在,代码如下所示:

class Scraper

  attr_accessor :parse_page

  def initialize
    doc = HTTParty.get("https://store.nike.com/gb/en_gb/pw/mens-nikeid-lifestyle-shoes/1k9Z7puZoneZoi3?ref=https%253A%252F%252Fwww.google.com%252F")
    @parse_page ||= Nokogiri::HTML(doc)
  end

  def get_names
    item_container.css(".product-display-name").css("p").children.map { |name| name.text }.compact
  end

  def get_prices
    item_container.css(".product-price").css("span.local").children.map { |price| price.text }.compact
  end

  private

  def item_container
    parse_page.css(".grid-item-info")
  end

  scraper = Scraper.new
  names = scraper.get_names
  prices = scraper.get_prices

  (0...prices.size).each do |index|
    puts " - - - Index #{index + 1} - - -"
    puts "Name: #{names[index]} | Price: #{prices[index]}"
  end

end

I've tried changing the .each method to include a File.write() but all it ever does is write the last line of the output into the xml file. 我尝试更改.each方法以包括File.write(),但是它所做的全部就是将输出的最后一行写入xml文件。 I would appreciate any insight as to how to parse the data correctly, I am new to scraping. 对于如何正确解析数据的任何见解,我将不胜感激。

I've tried changing the .each method to include a File.write() but all it ever does is write the last line of the output into the xml file. 我尝试更改.each方法以包括File.write(),但是它所做的全部就是将输出的最后一行写入xml文件。

Is the File.write method inside the each loop? each循环中each File.write方法吗? I guess what's happening here is You are overwriting the file on every iteration and that's why you are seeing only the last line. 我猜这里正在发生的事情是您在每次迭代中都覆盖文件,这就是为什么只看到最后一行的原因。

Try putting the each loop inside the block of the File.open method like: 尝试将each循环放入File.open方法的块中,例如:

File.open(yourfile, 'w') do |file|
  (0...prices.size).each do |index|
    file.write("your text")
  end
end

I also recommend reading about the Nokogiri::XML::Builder and then saving it's output to the file. 我还建议阅读有关Nokogiri :: XML :: Builder的信息 ,然后将其输出保存到文件中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM