Here's the code for the scraper
class Scrape
def perform
url = "# a long url"
agent = Mechanize.new
agent.get(url)
while(agent.page.link_with(:text => "Next Page \u00BB")) do
agent.page.search(".content").each do |item|
puts "."
House.create!({
# attributes...
})
end
agent.page.link_with(:text => "Next Page \u00BB").click
end
end
end
On my local environment I can run it in the rails console just by typing
Scrape.new.delay.perform # to queue the job
rake jobs:work
and it works perfectly.
However running the analogous (with a worker running instead of rake jobs:work) in the Heroku console doesn't seem to do anything. I tried logging some lines in the Heroku log and I can get the url variable to log (so the method is at least getting called) but the "." which is there to show each time we run the while loop never appears and no Houses are created in the database.
Anyone any ideas what might be wrong?
Solved this problem myself, pretty obscure bug though. I was using ruby 1.9.2 in my local environment but I had the app deployed on a ruby 1.8.7 stack.
The important difference being the change in character encoding between the two ruby versions which meant that Mechanize couldn't find a link with the unicode encoded character "\»" and thus didn't do any scraping.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.