简体   繁体   English

如何在Django网站中测试外部URL或链接?

[英]How to test external url or links in a django website?

Hi I am building a blogging website in django 1.8 with python 3. In the blog users will write blogs and sometimes add external links. 嗨,我正在django 1.8中用python 3建立一个博客网站。在博客中,用户将编写博客,有时还会添加外部链接。 I want to crawl all the pages in this blog website and test every external link provided by the users is valid or not. 我想抓取此博客网站中的所有页面,并测试用户提供的每个外部链接是否有效。

How can i do this? 我怎样才能做到这一点? Should i use something like python scrapy? 我应该使用python scrapy之类的东西吗?

import urllib2
import fnmatch

def site_checker(url):

    url_chk = url.split('/')
    if fnmatch.fnmatch(url_chk[0], 'http*'):
        url = url
    else:
        url = 'http://%s' %(url)
    print url

    try:
        response = urllib2.urlopen(url).read()
        if response:
            print 'site is legit'
    except Exception:
    print "not a legit site yo!"

site_checker('google') ## not a complete url
site_checker('http://google.com') ## this works

Hopefully this works. 希望这行得通。 Urllib will read the html of the site and if its not empty. Urllib将读取该站点的html,如果该HTML不为空。 It's a legit site. 这是一个合法的网站。 Else it's not a site. 否则,这不是一个网站。 Also I added a url check to add http:// if its not there. 我还添加了一个URL检查,以添加http://(如果不存在)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM