在Heritrix搜尋器工具中，如何從搜尋的URL中提取內容

Question

Heritrix工具的新手，現在我可以從www抓取網頁，現在要提取抓取的URL的內容。

請任何人幫助我。謝謝。

Answer 1

 1.first download the file  wget http://python.org/ftp/python/3.3.0/Python-3.3.0.tgz or higher version as root user.
 2. change the directory to installed python
 3. example /opt/python3.3/;
 4. configure the files ./configure --prefix=/opt/python3.3
 5.make
 6. sudo make install
 7. /opt/python3.3/bin/python3
 8.opt/python3.3/bin/pyvenv ~/py33
 9.source ~/py33/bin/activate
 10. wget http://python-distribute.org/distribute_setup.py
 11.python distribute_setup.py  
 12. easy_install pip
 13. pip install bottle
 14. pip install warcat 
 15. if successfully installed warcat then check whether your warcat is installed or not.
 16. python3 -m warcat --help after enter then we can see some help commands like, list,concat,extract etc..
 17.python3 -m warcat list example/at.warc.gz
 this is worked for me ..enjoy

在Heritrix搜尋器工具中，如何從搜尋的URL中提取內容

問題描述

1 個解決方案

解決方案1
3 已采納 2013-09-23 07:13:11

在Heritrix搜尋器工具中，如何從搜尋的URL中提取內容

問題描述

1 個解決方案

解決方案1 3 已采納 2013-09-23 07:13:11

解決方案1
3 已采納 2013-09-23 07:13:11