[英]How can I break a loop when there are no more items to append to list?
I'm writing a script that extracts internal links from a website.我正在编写一个从网站中提取内部链接的脚本。 As it goes to the internal links in the list it appends unrecognized links to the the list.
当它转到列表中的内部链接时,它会将无法识别的链接附加到列表中。
When it has appended all internal links I want to break the loop.当它附加了所有内部链接时,我想打破循环。
addr = "http://andnow.com/"
base_addr = "{0.scheme}://{0.netloc}/".format(urlsplit(addr))
o = urlparse(addr)
domain = o.hostname
i_url = []
def internal_crawl(url):
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'}
r = requests.get(url, headers = headers).content
soup = BeautifulSoup( r, "html.parser")
i_url.append(url)
try:
for link in [h.get('href') for h in soup.find_all('a')]:
if domain in link and "mailto:" not in link and "tel:" and not link.startswith('#'):
if link not in i_url:
i_url.append(link)
# print(link)
elif "http" not in link and "tel:" not in link and "mailto:" not in link and not link.startswith('#'):
internal = base_addr + link
if link not in i_url:
i_url.append(internal)
print(i_url)
except Exception:
print("exception")
internal_crawl(base_addr)
for l in i_url:
internal_crawl(l)
I've tried adding the following code, but I cant get it to work.我试过添加以下代码,但我无法让它工作。 I'm not sure if this is because my list is changing.
我不确定这是否是因为我的列表正在更改。
for x in i_url:
if x == i_url[-1]:
break
Is there a way to break the loop if the same item is last on the list twice in a row?如果同一项目连续两次出现在列表的最后,有没有办法打破循环?
Not exactly sure what you are trying to do.不完全确定您要做什么。 If I understand correctly, one way would be:
如果我理解正确,一种方法是:
prev = None
for x in i_url:
if x == prev:
break
# do stuff
prev = x
Is this what you are after:这是你追求的吗:
y = None
i_url = ["x", "y","z", "z","a"]
for x in i_url:
if x==y :
print ("found ", x)
break
else:
y=x
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.