简体   繁体   中英

How to print a dictionary of a list of URLs in Python

I've got a list of URLs that are in pairs. the first URL contains a link to the second URL (target). I created a function that returns a dictionary that maps the first URLs to the second URLs

I provided a short list of URL pairs below to show what I'm trying to work with. an actual list may be much longer.

I want to print the number of nodes (the URLs) & the number of edges (the number of links between URLs). Im trying to do this with the print_stats function.

what I have so far:

def load_graph(args):
  # Iterate through the file line by line
  url_map = {
    line.split()[0]: line.split()[1]
    for line in args.datafile}
  return url_map
  


 def print_stats(graph):
    """Print number of nodes and edges in the given graph"""
   

not sure where to go from here

links.txt

https://en.wikipedia.org/wiki/Rolls-Royce --> https://en.wikipedia.org/wiki/Wikipedia:Contents https://en.wikipedia.org/wiki/Rolls-Royce --> https://en.wikipedia.org/wiki/Rolls-Royce_Limited https://en.wikipedia.org/wiki/Rolls-Royce_Motor_Cars --> https://en.wikipedia.org/wiki/Bentley https://en.wikipedia.org/wiki/Rolls --> https://en.wikipedia.org/wiki/Rolls-Royce_Ghost

I would need some clarification to answer this, is one node one left-URL, as *Rolls-Royce is appearing twice, is this two nodes or one node with two children?

Your current implementation will overwrite the keyvalue if you have none-unique left-side URL:s, which may be a problem. Another approach would be to count each edge from left side URL:s, something like this:

count_dict = {}
with open("links.txt", "r") as f:
    for line in f:
        key, value = line.split(" --> ")
        
        # Check if we have had the key previously
        if not count_dict.get(key, None):
            count_dict[key] = 1
        else:
            # Increase edge count
            count_dict[key] += 1

for key in count_dict:
    print(f"{key} have {count_dict[key]} edges."

print(f"There is a total of {len(count_dict} unique nodes")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM