简体   繁体   中英

How do you add the first site to a graph in a web crawler?

I'm trying to write a web crawler program but am having trouble understanding a pretty simple concept about dictionaries. I want to make a graph (dictionary) of the links on a website. here is my code:

def crawl(site, graph, dist):
    links = analyze(site)
    graph.add(site)              ##graph[site].add(site)? but site isn't yet a key..
    for link in links:
        parsedurl = urlparse(link)
        desc = parsedurl.netloc
        if parsedurl.scheme != 'http' or parsedurl.scheme != 'https':
             continue
        else:
            if link in site:
                continue
            else:
                graph[site].add(link)
    return graph

I can't figure out how to add the site to the graph, since I need to have a key for the graph (otherwise I get the error message "dict object has no attribute 'add'") but the graph is empty so site isn't yet a key.

any ideas would be greatly appreciated. Thank you!

将密钥和数据都设为站点graph[site]=site

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM