简体   繁体   中英

How to detect Joomla website?

I'm trying to create a script which detects if the given URL points to Joomla website.

For now I have:

def is_joomla(url):
    manifest = url + '/administrator/manifests/files/joomla.xml'
    # get XML

    if "joomla" in XML: # simplified
        return True
    return False

Another option is to check for Joomla string in html in lowercase but it's not reliable. I know that some Joomla pages can't be detected but most of them should be.

Are there another signs which helps me to detect if it's Joomla? I don't care about version.

I would check different points to make sure the page you´re crawling is a joomla site.

  1. Check if DOMAIN/administrator/ responses a 200. If true, check the meta data to make sure it is joomla. On administration page you can find <meta name="generator" content="Joomla! - Open Source Content Management" />
  2. Check for DOMAIN/robots.txt which is default shipped by joomla and search for "Joomla" in this file. Should also return true.
  3. You can check other parameters in the html like if there is a "templates" folder, where the stylesheets are stored in.

I think if options 1 and 3 return true, you can be very sure that the crawled page is a joomla site. Option 2 is not save, because some people might delete there robots.txt.

You can see a website is running Joomla by simple check the response code ( X-CF-Powered-By) from the site here a sample:

Antwort HTTP/1.1 200 OK
Date    Fri, 28 Jul 2017 09:02:35 GMT
Content-Type    text/html; charset=utf-8
Transfer-Encoding   chunked
X-CF-Powered-By CF-Joomla 0.1.5
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Wed, 02 Aug 2017 09:02:35 GMT

Before going ahead you need to establish a connection till the time this html file is not in the disk. Apart from that you can use BeautifulSoup to parse it and use the tag you are looking for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM