简体   繁体   English

如何将使用javascript的网页转换为纯HTML?

[英]How can I convert web page with javascript to plain html?

I want to convert some web pages with javascript to plain html, and I found there several ways(pls tell me if I'm wrong): 我想将一些使用javascript的网页转换为纯html,然后发现了几种方法(请告诉我是否错误):

  1. Use Jython, an example: http://blog.databigbang.com/web-scraping-ajax-and-javascript-sites/ 以Jython为例: http : //blog.databigbang.com/web-scraping-ajax-and-javascript-sites/
  2. Use Java together with htmlunit 将Java与htmlunit一起使用
  3. Use a proxy, an example: http://grep.codeconsult.ch/2007/02/24/crowbar-scrape-javascript-generated-pages-via-gecko-and-rest/ 使用代理,例如: http : //grep.codeconsult.ch/2007/02/24/crowbar-scrape-javascript-generated-pages-via-gecko-and-rest/
  4. Use python together with qt or PyV8 将python与qt或PyV8一起使用

Because I want to make a tiny tool to meet my request, and I thought it somewhat complicated to install V8 and qt, although python is my first choice. 因为我想制作一个微型工具来满足我的要求,并且我认为安装V8和qt有点复杂,尽管python是我的首选。

So I tried to make a proxy with gecko, but it seems need a DISPLAY which I can not afford in a remote Linux server. 所以我试图用壁虎做代理,但是似乎需要一个在远程Linux服务器上买不起的DISPLAY。

Now I am trying to use jython, but it seems there is no simple way to just convert a whole page to plain html. 现在,我正在尝试使用jython,但是似乎没有简单的方法可以将整个页面转换为纯HTML。

Actually, I want to ask is there a way to convert a web page contains javascript to plain html, just like the brower does. 其实,我想问问有没有办法像浏览器一样将包含javascript的网页转换为纯html。 Can node.js do this job? node.js可以完成这项工作吗?

I've recently built a server on top of PhantomJS that does this. 我最近在PhantomJS之上构建了一个服务器来执行此操作。 I highly recommend this route. 我强烈推荐这条路线。

http://phantomjs.org/ http://phantomjs.org/

Basically, you write a quick script that has PhantomJS run the page, and configure a trigger method that lets you know the page is finished and sends the data off. 基本上,您编写了一个使PhantomJS运行页面的快速脚本,并配置了一个触发方法,该方法使您知道页面已完成并发送数据。 My version used the built-in HTTP server, so PhantomJS easily served up the results on its own. 我的版本使用内置的HTTP服务器,因此PhantomJS可以轻松地自行提供结果。 This takes about 15 lines of code to do. 这大约需要15行代码。 (Sorry, can't paste it here... wrote it on work time. But, check out the example on their home page. It's almost complete!) (对不起,不能在这里粘贴它……在工作时间写出来。但是,请在他们的主页上查看示例。它几乎完成了!)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将此 HTML 代码转换为 JavaScript? - How I can convert this HTML code to JavaScript? How can i convert a localhost webiste project into Android Web Apllication usinga Html css Javascript - How can i convert a localhost webiste project into Android Web Apllication usinga Html css Javascript 如何在 node.js 中将 HTML 页面转换为纯文本? - How to convert HTML page to plain text in node.js? 如何获取动态生成的网页的纯文本源? - How I can get the plain-text source of a web page that I've generated dynamically? 如何将HTML表转换为纯文本并将其保存到JavaScript中的变量中 - How to convert an HTML table to plain text and save it into a variable in JavaScript 如何使用纯 JavaScript 将字符转换为 HTML 实体 - How to convert characters to HTML entities using plain JavaScript 如何使用纯 Javascript 将 unicode 字符转换为 HTML 数字实体 - How to convert unicode characters to HTML numeric entities using plain Javascript 如何将此 jQuery 代码转换为普通 Javascript(HTML 后缀)? - How to convert this jQuery code to plain Javascript (HTML Affix)? 如何在没有任何库的情况下在JavaScript中拍摄快照并转换为HTML页面的PDF? - How can I take snap shot and convert to PDF of section of HTML page in JavaScript without any Library? 如何使用普通的javascript“鼠标轻扫”? - How can I “mouse swipe” with plain javascript?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM