简体   繁体   English

Python Web爬网并存储到mysql

[英]Python web crawling and storing to mysql

I need good web crawler written in Python to store complete page into mysql database. 我需要使用Python编写的优秀Web搜寻器,以将完整页面存储到mysql数据库中。 Small system that I am experimenting uses now PHP Sphider to crawl and store into database. 我正在尝试的小型系统现在使用PHP Sphider进行爬网并将其存储到数据库中。 I need something that works almost exact like sphider, but writen in Python. 我需要像sphider一样工作的东西,但是用Python编写。 So just storing database to into table where from other scripts taking content and doing the rest of job that I need. 因此,只需将数据库存储到表中,其他脚本就可以从中获取内容并完成我需要的其余工作。 Sphider is slow, and want to replace it. Sphider速度很慢,想替换它。

So, I look at scrapy and some other projects but anything didn't feet in my needs, this is my last try before I start coding myself, so if someone know what can solve me this problem please tell me. 因此,我查看了scrapy和其他一些项目,但是没有满足我的需求,这是我开始编写自己代码之前的最后尝试,所以如果有人知道什么可以解决我这个问题,请告诉我。

BeWARE! 谨防!

This answer is tailored for beginners it is NOT the optimal or the most clever. 这个答案是为初学者量身定制的,它不是最佳或最聪明的方法。

But for you I highly recommend scrapy . 但是我强烈建议您scrapy Try the tutorial. 试试看教程。 And remember to use Firefox + Firebug extension for you to navigate and learn the inner paths , xpaths and html locations of your data for posterior parser. 请记住使用Firefox + Firebug扩展程序来导航和了解后验解析器数据的内部路径xpathshtml位置。

Check similar answers " Going from Ruby to Python crawlers " and " Python read my outlook email mailbox and parse messages " 检查类似的答案“ 从Ruby到Python搜寻器 ”和“ Python读取我的Outlook电子邮件邮箱并解析消息

Save your time and use Firefox with the FireBug extensions (enable the inspect ) 节省您的时间,并使用带有FireBug扩展的Firefox(启用inspect

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM