简体   繁体   中英

How do I avoid EOFError with Ruby script?

I have a Ruby script (1.9.2p290) where I am trying to call a number of URLs, and then append information from those URLs into a file. The issue is that I keep getting an end of file error - EOFError. An example of what I'm trying to do is:

require "open-uri"
proxy_uri = URI.parse("http://IP:PORT")
somefile = File.open("outputlist.txt", 'a')

(1..100).each do |num|
  page = open('SOMEURL' + num, :proxy => proxy_uri).read
  pattern = "<img"   
  tags = page.scan(pattern)
  output << tags.length
end
somefile.puts output
somefile.close

I don't know why I keep getting this end of file error, or how I can avoid getting the error. I think it might have something to do with the URL that I'm calling (based on some dialogue here: What is an EOFError in Ruby file I/O? ), but I'm not sure why that would affect the I/O or cause an end of file error.

Any thoughts on what I might be doing wrong here or how I can get this to work?

Thanks in advance!

The way you are writing your file isn't idiomatic Ruby. This should work better:

(1..100).each do |num|
  page = open('SOMEURL' + num, :proxy => proxy_uri).read
  pattern = "<img"   
  tags = page.scan(pattern)
  output << tags.length
end

File.open("outputlist.txt", 'a') do |fo|
  fo.puts output
end

I suspect that the file is being closed because it's been opened, then not written-to while 100 pages are processed. If that takes a while I can see why they'd close it to avoid apps using up all the file handles. Writing it the Ruby-way automatically closes the file immediately after the write, avoiding holding handles open artificially.

As a secondary thing, rather than use a simple pattern match to try to locate image tags, use a real HTML parser. There will be little difference in processing speed, but potentially more accuracy.

Replace:

page = open('SOMEURL' + num, :proxy => proxy_uri).read
pattern = "<img"   
tags = page.scan(pattern)
output << tags.length

with:

require 'nokogiri'

doc = Nokogiri::HTML(open('SOMEURL' + num, :proxy => proxy_uri))
output << doc.search('img').size

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM