简体繁体 English

收集html文件中使用的所有js css和img资源

[英]collect all the js css and img resources used in a html file

原文 2016-08-02 02:54:38 5 1 javascript/ html/ node.js

I want to write a npm package to localize an html url. 我想写一个npm包来本地化一个HTML网址。
1. using the html url download the html page 1.使用html url下载html页面
2. parse the html file, extract all the js, css and img files used in the html and local these resources. 2.解析html文件，提取html中使用的所有js，css和img文件以及本地这些资源。
3. If these js, css and img files using some external resources, localize these resources. 3.如果这些js，css和img文件使用一些外部资源，则本地化这些资源。 For example, extract background image in the css. 例如，在css中提取背景图像。

The first and second requirements are easy to meet. 第一和第二个要求很容易满足。 But I have no idea about the last one. 但我不知道最后一个。 I can parse the all the css files and localize the resources used in it. 我可以解析所有的css文件并本地化其中使用的资源。 But how can I parse the js files? 但是我如何解析js文件？
For example: If the js adds a 'script src = XXX' tag into the html dom, how can I extract the src? 例如：如果js在html dom中添加了'script src = XXX'标签，我该如何提取src？

1 个解决方案

I think I would try to use a headless browser to catch every network calls instead of trying to parse the code. 我想我会尝试使用无头浏览器来捕获每个网络调用而不是尝试解析代码。

I didn't used it personally but PhantomJS seems to fit the bill. 我没有亲自使用它，但PhantomJS似乎符合要求。

It can be used to load a webpage then execute any script / css that would normally happen on the request and execute stuff once the page is loaded. 它可用于加载网页，然后执行通常在请求上发生的任何脚本/ css，并在页面加载后执行。

The network monitoring features are probably what you'll want to use. 网络监控功能可能是您想要使用的功能。