简体   繁体   中英

how do I advance to the next item in a nested list? Python

Working with a couple of lists, iterating over each. Here's a code segment:

self.links = []
self.iter=iter(self.links)
for tgt in self.links:
    for link in self.mal_list:
        print(link)
        if tgt == link:
           print("Found Suspicious Link: {0}".format(tgt))
           self.count += 1

        else:
           self.count += 1
           self.crawl(self.iter.next())

Its advancing to the next item in the link list, just fine. For the malware signature list I tried using a similar iter item, but I'm not entirely sure if thats even the best way, and if so were to place it in my code so that each link that is urlopened from the list is compared to every item in the malware list BEFORE the loop opens up the next item in the link list. Any suggestions?

Not sure, what you are trying to ask but you could simplify your code. Though this is not necessary.

self.links = []
self.non_malware_link = [link for link in self.links if link not in self.mal_list]
results = map(self.crawl, self.non_malware_link)

On some issues with your code:

  1. self.count is exactly the same as len(self.links)

Apart from meaning of self.count, every thing else looks like it does what it needs to do.

The essential way that you are doing it is fine, but it will be slow.

Try this instead:

 for tgt in links:
      if tgt in mal_links:
          # you know that it's a bad link
      else:
          crawl(tgt)

I don't see why you are keeping two iterators going over the list. This will introduce a bug because you don't call next on self.iter in the case that you detect a malware link. The next time tgt isn't a bad link, when you call next , it will advance to the previously detected bad link and you will crawl that. Is there some reason that you feel the need to step over two copies of the iterator instead of just one?

Also, your initial code will crawl page once for every time it is not determined to be equal to a given malware link. This might lead to some angry web masters depending on how big your list is.

Searching an item inside a list is slow, if this is what you're trying to do, then use a dict or a set instead of list for the self.mal_list :

mal_list = set(self.mal_list)
for tgt in self.links:
    if tgt in mal_list: 
        print("Found Suspicious Link: {0}".format(tgt))
        self.count += 1
    else:
        self.count += 1
        self.crawl(self.iter.next())

or, if you can have self.links as set as well:

mal_list = set(self.mal_list)
links = set(self.links)
detected = links.intersection(mal_list)
for malware in detected:
    print("Found Suspicious Link: {0}".format(tgt))
    self.count += 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM