简体   繁体   中英

How would i go about creating a WebPage Copier Application

Hi i do a lot of research on the web but at most cases i don't have the connectivity to the WWW at all of time of my research. So i would like to get started on creating a Website Copier Application... I am aware of HHTrack Website Copier but that program has it's limitations for example it cant copy any of the Wikipedia Pages.. probably cause it access the "targets" servers on a specified port that is permitted to be accessed at a certain degree. But what im looking for is basically not copying the "whole" site but just pieces of it. The procedure of copying a web page is simple if you doing it manually. 1)Goto the URL. 2)Click on File>Save Page As. 3)make sure you save as "complete page". 3 easy steps. How would do this automated? i could use a macro but that just making more work than it has to be. triple the effort....

I could probably create a plug in for Firefox that you put in a list of URLs that you would like to be save on to your machine. But im not very familiar with the API/SDK I could probably look at the HHTrack Src and kinda savage some parts and put a nice puzzle together. But if i would to do this from scratch what APIs would i need to look at for either in C, C++, or Java? im not looking for a gui. but just a simple program. so what are you thoughts?

if you wondering what im researching. Mathematics, Telecommunications, Programming, Computer Architecture, Magnetism. Books cost money and sometimes give more info than you need and are not as portable as a netbook. JUST because im researching these fields doesn't mean im a know it all..... so much help would be most helpful.

Wget --mirror http://example.com

You might want to check wget as well : for Java : http://www.koders.com/java/fid8A3F9CE8B64CA6212A5018CF8A345BCC58796ACE.aspx?s=Quota#L95

For C++ : check this old Stack OverFlow question and answer: Options for web scraping - C++ version only

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM