简体   繁体   中英

How can I break a loop when there are no more items to append to list?

I'm writing a script that extracts internal links from a website. As it goes to the internal links in the list it appends unrecognized links to the the list.

When it has appended all internal links I want to break the loop.

addr = "http://andnow.com/"
base_addr = "{0.scheme}://{0.netloc}/".format(urlsplit(addr))

o = urlparse(addr)
domain = o.hostname

i_url = []

def internal_crawl(url):

    headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'}

    r = requests.get(url, headers = headers).content
    soup = BeautifulSoup( r, "html.parser")

    i_url.append(url)
    try:
        for link in [h.get('href') for h in soup.find_all('a')]:
            if domain in link and "mailto:" not in link and "tel:" and not link.startswith('#'):
                if link not in i_url:
                    i_url.append(link)
#               print(link)
            elif "http" not in link and "tel:" not in link and "mailto:" not in link and not link.startswith('#'):
                internal = base_addr + link
                if link not in i_url:
                    i_url.append(internal)
        print(i_url)

    except Exception:
        print("exception")

internal_crawl(base_addr)

for l in i_url:
    internal_crawl(l)

I've tried adding the following code, but I cant get it to work. I'm not sure if this is because my list is changing.

for x in i_url:
    if x == i_url[-1]:
        break

Is there a way to break the loop if the same item is last on the list twice in a row?

Not exactly sure what you are trying to do. If I understand correctly, one way would be:

prev = None
for x in i_url:
    if x == prev:
        break
    # do stuff
    prev = x

Is this what you are after:

y = None
i_url = ["x", "y","z", "z","a"]
for x in i_url:
  if x==y :
    print ("found ", x)
    break
  else:
    y=x

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM