简体   繁体   中英

Scraping Data with Scrapy in Python

I want to help my friend to analyze Posts on Social Networks (Facebook, Twitter, Linkdin and etc.) as well as several weblogs and websites.

I have several questions and try to categorize them:

When it comes to Scraping Data , my idea is scraping data on social media via APIs and for sites via RSS or site crawling use Scrapy library . I like to know if Scrapy is optimal enough to give me the best result in short time and with the least usage of resources or not?

Technically, Scrapy should do the job just fine so long as you code it right and find the paths you need from the APIs or through analyzing the code of the sites.

Be aware though that using "automated means" to crawl or scrape data from these sites is a breach of their respective terms of use agreements (Twitter is pretty lax on this though). Which means, if they see a bunch of requests coming from your IP address and think you might be either A.) using a bot or B.) performing a DOS attack... they'll shut you down fast and you might have LEOs knocking on/down your door.

A lot of these do have ways to go about getting permission to do so, but I doubt they give permission to just anybody.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM