简体   繁体   English

当没有更多项目要附加到列表时,如何打破循环?

[英]How can I break a loop when there are no more items to append to list?

I'm writing a script that extracts internal links from a website.我正在编写一个从网站中提取内部链接的脚本。 As it goes to the internal links in the list it appends unrecognized links to the the list.当它转到列表中的内部链接时,它会将无法识别的链接附加到列表中。

When it has appended all internal links I want to break the loop.当它附加了所有内部链接时,我想打破循环。

addr = "http://andnow.com/"
base_addr = "{0.scheme}://{0.netloc}/".format(urlsplit(addr))

o = urlparse(addr)
domain = o.hostname

i_url = []

def internal_crawl(url):

    headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'}

    r = requests.get(url, headers = headers).content
    soup = BeautifulSoup( r, "html.parser")

    i_url.append(url)
    try:
        for link in [h.get('href') for h in soup.find_all('a')]:
            if domain in link and "mailto:" not in link and "tel:" and not link.startswith('#'):
                if link not in i_url:
                    i_url.append(link)
#               print(link)
            elif "http" not in link and "tel:" not in link and "mailto:" not in link and not link.startswith('#'):
                internal = base_addr + link
                if link not in i_url:
                    i_url.append(internal)
        print(i_url)

    except Exception:
        print("exception")

internal_crawl(base_addr)

for l in i_url:
    internal_crawl(l)

I've tried adding the following code, but I cant get it to work.我试过添加以下代码,但我无法让它工作。 I'm not sure if this is because my list is changing.我不确定这是否是因为我的列表正在更改。

for x in i_url:
    if x == i_url[-1]:
        break

Is there a way to break the loop if the same item is last on the list twice in a row?如果同一项目连续两次出现在列表的最后,有没有办法打破循环?

Not exactly sure what you are trying to do.不完全确定您要做什么。 If I understand correctly, one way would be:如果我理解正确,一种方法是:

prev = None
for x in i_url:
    if x == prev:
        break
    # do stuff
    prev = x

Is this what you are after:这是你追求的吗:

y = None
i_url = ["x", "y","z", "z","a"]
for x in i_url:
  if x==y :
    print ("found ", x)
    break
  else:
    y=x

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在循环结束时将 for 循环和 append 项目运行到列表中? - How can I run a for loop and append items to a list at the end of the loop? 尝试将 append 添加到列表时,如何使我的 pandas DataFrame 循环更有效率 - How can I make my pandas DataFrame loop more efficient when trying to append to a list Python:如何中断循环并附加结果的最后一页? - Python: How can I break loop and append the last page of results? 如何将 append 项目添加到 python 中的 for 循环内的列表中? - How do I append items to a list inside a for loop in python? 如何将for循环与列表项结合 - How can I combine for loop with list items 如何在 for 循环中通过 for 循环将列表附加到数据帧 - How can i append a list to a dataframe via for loop in a for loop 如何对列表中的 append 项目使用 for 循环? - How to use a for loop to append items in a list? 如何在列表中附加嵌套元组项? - How can I append nested tuple items in a list? 为什么不包含我的追加列表? 我试图在循环时继续向列表中添加更多项目,但是当循环继续时,它就消失了 - why does it not contain my append list? im trying to continue adding more items to the list while looping but when it the loop continues it vanishes 当中断不起作用时如何停止循环? - How can i stop a loop when break does not work?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM