简体   繁体   English

用Nokogiri :: HTML和Ruby进行Web爬网-输出到CSV问题

[英]Web Scraping with Nokogiri::HTML and Ruby - Output to CSV issue

I have a script that scrapes HTML article pages of a webshop. 我有一个脚本,可以刮擦网上商店的HTML文章页面。 I'm testing with a set of 22 pages of which 5 article pages have a product description and the others don't. 我正在测试一组22页的页面,其中5个文章页面包含产品说明,而其他页面则没有。

This code puts the right info on screen: 此代码将正确的信息显示在屏幕上:

if doc.at_css('.product_description')
  doc.css('div > .product_description > p').each do |description|
    puts description
  end
  else
    puts "no description"
end

But now I'm stuck on how to get this correctly to output the found product descriptions to an array from where I'm writing them to a CSV file. 但是,现在我仍然停留在如何正确获取此信息的位置上,以将找到的产品描述输出到从中将它们写入CSV文件的数组中。

Tried several options, but none of them works so far. 尝试了几种选择,但到目前为止,它们都不起作用。 If I replace the puts description for @description << description.content , then all the descriptions of the articles end up in the upper lines in the CSV although they do not belong to the articles in that line. 如果将puts description替换为@description << description.content ,则文章的所有描述都将以CSV的上一行结尾,尽管它们不属于该行中的文章。

When I also replace "no description" for @description = "no description" then the first 14 lines in my CSV recieve 1 letter of "no description" each. 当我也将@description = "no description"替换为“ no description”时,CSV的前14行分别收到1个字母“ no description”。 Looks funny, but it is not exactly what I need. 看起来很有趣,但这并不是我真正需要的。

If more code is needed, just shout! 如果需要更多代码,请大喊!

This is the CSV code I use in the script: 这是我在脚本中使用的CSV代码:

    CSV.open("artinfo.csv", "wb") do |row|
    row << ["category", "sub-category", "sub-sub-category", "price", "serial number",  "title", "description"]
    (0..@prices.length - 1).each do |index|
    row << [
            @categories[index], 
            @subcategories[index], 
            @subsubcategories[index], 
            @prices[index],
            @serial_numbers[index], 
            @title[index],
            @description[index]]
     end 
    end  

It sounds like your data isn't lined up properly. 听起来您的数据未正确排列。 If it were you should be able to do: 如果是这样,您应该可以:

CSV.open("artinfo.csv", "w") do |csv|
  csv << ["category", "sub-category", "sub-sub-category", "price", "serial number",  "title", "description"]
  [@categories, @subcategories, @subsubcategories, @prices, @serial_numbers, @title, @description].transpose.each do |row|
    csv << row
  end 
end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM