def get_next_target(page):
start_link = page.find('<a href="')
if start_link == -1:
return None, 0
end_link = page.find('">', start_link)
url = page[start_link + 9 : end_link]
return url, end_link
def print_all_links(page):
url = True
while url != None:
url, endpos = get_next_target(page)
if url:
print url #True
page = page[endpos : ]
else:
break
page = '<div id="top_bin"><div id="top_content" class="width960"><div class="udacity float-left"><a href="http://udacity.com"><a href="http://udacity.com"><a href="http://udacity.com"><a href="http://udacity.com">'
print print_all_links(page)
My question here is when I print out the result it will print four URLs and that what is expected, but when I set the while to while url == True:
it will print out one URL only, so what is the reason? Isn't != None equals to ==True
Note that it isn't considered very good design in the first place to return this kind of sentinel. get_next_target
should return a target, and nothing else (ignoring, for now, the state needed to find the next target). If there is an error, raise an exception. In this case, the lack of another target isn't really an error, but as we'll see, it does signal the end of the iteration. There is already an exception for that: StopIteration
.
def get_next_target(page):
start_link = page.find('<a href="')
if start_link == -1:
raise StopIteration
end_link = page.find('">', start_link)
url = page[start_link + 9 : end_link]
return url, end_link
def print_all_links(page):
while True:
try:
url, endpos = get_next_target(page)
print url
page = page[endpos:]
except StopIteration:
break
We can write a better iterator for returning links from a given page, though, that doesn't expose the state needed to parse the page.
def get_targets(page):
while True:
start_link = page.find('<a href="')
if start_link == -1:
break
end_link = page.find('">', start_link)
yield page[start_link + 9:end_link]
def print_all_links(page):
for url in get_targets(page):
print url
What if url == "http://stackoverflow.com"
? Then it's not equal to True
, so the while stops. But it is unequal to None
, so if you check for that then the loop continues.
If you instead said
if url:
or
if bool(url) == True:
Then it would work as desired. url
is only True one time, after the first iteration it is a string and "anystring" != True
, but bool("anystring")
where "anystring" is not the empty string, is True.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.