简体   繁体   English

如何前进到嵌套列表中的下一个项目? 蟒蛇

[英]how do I advance to the next item in a nested list? Python

Working with a couple of lists, iterating over each. 使用几个列表,迭代每个列表。 Here's a code segment: 这是一段代码:

self.links = []
self.iter=iter(self.links)
for tgt in self.links:
    for link in self.mal_list:
        print(link)
        if tgt == link:
           print("Found Suspicious Link: {0}".format(tgt))
           self.count += 1

        else:
           self.count += 1
           self.crawl(self.iter.next())

Its advancing to the next item in the link list, just fine. 它推进到链接列表中的下一个项目,就好了。 For the malware signature list I tried using a similar iter item, but I'm not entirely sure if thats even the best way, and if so were to place it in my code so that each link that is urlopened from the list is compared to every item in the malware list BEFORE the loop opens up the next item in the link list. 对于恶意软件签名列表,我尝试使用类似的iter项目,但我不完全确定这是否是最好的方法,如果是这样的话,将它放在我的代码中,以便从列表中删除的每个链接进行比较恶意软件列表中的每个项目在循环打开链接列表中的下一个项目之前。 Any suggestions? 有什么建议么?

Not sure, what you are trying to ask but you could simplify your code. 不确定,你想要问什么,但你可以简化你的代码。 Though this is not necessary. 虽然这不是必要的。

self.links = []
self.non_malware_link = [link for link in self.links if link not in self.mal_list]
results = map(self.crawl, self.non_malware_link)

On some issues with your code: 关于代码的一些问题:

  1. self.count is exactly the same as len(self.links) self.count与len完全相同(self.links)

Apart from meaning of self.count, every thing else looks like it does what it needs to do. 除了self.count的意义之外,其他任何东西看起来都像它需要做的那样。

The essential way that you are doing it is fine, but it will be slow. 你做这件事的基本方法很好,但速度很慢。

Try this instead: 试试这个:

 for tgt in links:
      if tgt in mal_links:
          # you know that it's a bad link
      else:
          crawl(tgt)

I don't see why you are keeping two iterators going over the list. 我不明白你为什么要让两个迭代器越过列表。 This will introduce a bug because you don't call next on self.iter in the case that you detect a malware link. 这将引入一个错误,因为在您检测到恶意软件链接的情况下,您不会在self.iter上调用next The next time tgt isn't a bad link, when you call next , it will advance to the previously detected bad link and you will crawl that. 下一次tgt不是一个糟糕的链接,当你打电话给next ,它将前进到先前检测到的坏链接,你将抓取它。 Is there some reason that you feel the need to step over two copies of the iterator instead of just one? 是否有某些原因让你感觉需要跨过两个迭代器副本而不是一个?

Also, your initial code will crawl page once for every time it is not determined to be equal to a given malware link. 此外,每次未确定等于给定恶意软件链接时,您的初始代码将对页面进行一次爬网。 This might lead to some angry web masters depending on how big your list is. 这可能会导致一些愤怒的网站管理员,这取决于您的列表有多大。

Searching an item inside a list is slow, if this is what you're trying to do, then use a dict or a set instead of list for the self.mal_list : 搜索列表中的项目很慢,如果这是您要执行的操作,则使用dictset而不是list来表示self.mal_list

mal_list = set(self.mal_list)
for tgt in self.links:
    if tgt in mal_list: 
        print("Found Suspicious Link: {0}".format(tgt))
        self.count += 1
    else:
        self.count += 1
        self.crawl(self.iter.next())

or, if you can have self.links as set as well: 或者,如果你也可以设置self.links:

mal_list = set(self.mal_list)
links = set(self.links)
detected = links.intersection(mal_list)
for malware in detected:
    print("Found Suspicious Link: {0}".format(tgt))
    self.count += 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM