[英]How to test external url or links in a django website?
Hi I am building a blogging website in django 1.8 with python 3. In the blog users will write blogs and sometimes add external links. 嗨,我正在django 1.8中用python 3建立一个博客网站。在博客中,用户将编写博客,有时还会添加外部链接。 I want to crawl all the pages in this blog website and test every external link provided by the users is valid or not.
我想抓取此博客网站中的所有页面,并测试用户提供的每个外部链接是否有效。
How can i do this? 我怎样才能做到这一点? Should i use something like python scrapy?
我应该使用python scrapy之类的东西吗?
import urllib2
import fnmatch
def site_checker(url):
url_chk = url.split('/')
if fnmatch.fnmatch(url_chk[0], 'http*'):
url = url
else:
url = 'http://%s' %(url)
print url
try:
response = urllib2.urlopen(url).read()
if response:
print 'site is legit'
except Exception:
print "not a legit site yo!"
site_checker('google') ## not a complete url
site_checker('http://google.com') ## this works
Hopefully this works. 希望这行得通。 Urllib will read the html of the site and if its not empty.
Urllib将读取该站点的html,如果该HTML不为空。 It's a legit site.
这是一个合法的网站。 Else it's not a site.
否则,这不是一个网站。 Also I added a url check to add http:// if its not there.
我还添加了一个URL检查,以添加http://(如果不存在)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.