简体   繁体   English

使用Python从网站中收集HTML数据

[英]Scraping HTML data from website in Python

I'm trying to scrape certain pieces of HTML data from certain websites, but I can't seem to scrape the parts I want. 我正在尝试从某些网站抓取某些HTML数据,但似乎无法抓取所需的部分。 For instance I set myself the challenge of scraping the number of followers from this blog , but I can't seem to do so. 例如,我为自己设定了从这个博客中删除追随者数量的挑战,但我似乎没有做到。

I've tried using urllib, request, beautifulsoup as well as Jam API . 我试过使用urllib,request,beautifulsoup以及Jam API

Here's what my code looks like at the moment: 这是我目前的代码:

from bs4 import BeautifulSoup
from urllib import urlopen
import json
import urllib2

html = urlopen('http://freelegalconsultancy.blogspot.co.uk/')
soup = BeautifulSoup(html, "lxml")
print soup

How would I go about pulling the number of followers in this instace? 在这个实例中,我将如何吸引更多的追随者?

You can't grab the followers as it's a widget loaded by javascript. 您无法抓住关注者,因为它是javascript加载的小部件。 You need to grab parts of the html by css class or id or by the element. 您需要通过css类或id或元素来获取html的一部分。

Eg: 例如:

from bs4 import BeautifulSoup
from urllib import urlopen

html = urlopen('http://freelegalconsultancy.blogspot.co.uk/')
soup = BeautifulSoup(html)

assert soup.h1.string == '\nLAW FOR ALL-M.MURALI MOHAN\n'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM