简体繁体中英

How can I crawl an HTML5 website and convert its HTML content to PDF (using a Python or Ruby library)?

原文 2012-08-24 00:50:27 7 1 javascript/ python/ ruby/ html5/ frameworks

I'm looking for an engine/solution/framework/gem/egg/lib/whatever for either Ruby or Python to log into a website, crawl HTML5 content (mainly charts on a canvas), and be able to convert it into a PDF file (or image).

I'm able to write crawling scripts in mechanize so I can log onto the website and crawl the data, but mechanize does not understand complex JavaScript + HTML5.

So basically I'm looking for an HTML5/JavaScript interpreter.

1 answers

This question is a bit confusing... sorry re-read my answer after reading the question again.

Your question has two parts:

1. How can I crawl a website

Crawling can be done using Mechinize, but as you said, it doesn't do Javascript very well. So one alternative is to use Capybara-webkit or Selenium (firefox / chrome).

Usually this is used for testing, but you may be able to drive it using Ruby code to navigate the various pages.

2. How can I convert the output to PDF

If you need to convert the crawled content to PDF, I don't think there is a way to do that. You may be able to take a screenshot (useful for testing) using Capybara-webkit or Selenium, but converting that to PDF may be just a matter of pumping it through some command line utility.

If you're looking for a true HTML to PDF converter (usually used to generate reports from views in a rails app), then use PDFKit

Basically it's a WebKit browser that can output to PDF. Really simple to run with.

How can I incorporate audio in a website using HTML5?

How to display pdf content in web page using html5 and jquery?

How to display HTML5 convert of a PDF on page without using an IFrame?

How can I clientside export HTML to PDF using PhoneGap1.4 Xcode4.2 HTML5

How can I convert HTML/CSS to PDF using Javascript?

HTML5 download website as PDF

Can I get a preview for a PDF using HTML5 File API?

html5 video element - how can I get its name

How using javascript can i save a PDF from inside a frame with a html5 <embed> tag

How can I take snap shot and convert to PDF of section of HTML page in JavaScript without any Library?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How can I incorporate audio in a website using HTML5? How to display pdf content in web page using html5 and jquery? How to display HTML5 convert of a PDF on page without using an IFrame? How can I clientside export HTML to PDF using PhoneGap1.4 Xcode4.2 HTML5 How can I convert HTML/CSS to PDF using Javascript? HTML5 download website as PDF Can I get a preview for a PDF using HTML5 File API? html5 video element - how can I get its name How using javascript can i save a PDF from inside a frame with a html5 <embed> tag How can I take snap shot and convert to PDF of section of HTML page in JavaScript without any Library?

Related Tags

How can I crawl an HTML5 website and convert its HTML content to PDF (using a Python or Ruby library)?

Question

1 answers

solution1 2 ACCPTED 2012-08-24 04:17:21

solution1
2 ACCPTED 2012-08-24 04:17:21