简体   繁体   中英

collect all the js css and img resources used in a html file

I want to write a npm package to localize an html url.
1. using the html url download the html page
2. parse the html file, extract all the js, css and img files used in the html and local these resources.
3. If these js, css and img files using some external resources, localize these resources. For example, extract background image in the css.

The first and second requirements are easy to meet. But I have no idea about the last one. I can parse the all the css files and localize the resources used in it. But how can I parse the js files?
For example: If the js adds a 'script src = XXX' tag into the html dom, how can I extract the src?

I think I would try to use a headless browser to catch every network calls instead of trying to parse the code.

I didn't used it personally but PhantomJS seems to fit the bill.

It can be used to load a webpage then execute any script / css that would normally happen on the request and execute stuff once the page is loaded.

The network monitoring features are probably what you'll want to use.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM