i'm building a webscraper and want it retrieve the url from a title. This is the code i'm currently using :
for item in g_data:
print item.contents[1].find_all("a", {"class": "a-link-normal"})[1]
And this prints:
<a class="a-link-normal s-access-detail-page a-text-normal"
href="http://www.amazon.co.uk/Scotch-BUFF-Brown-Packaging-Parcel/dp/B001OYOI5E"
title="3M Scotch BUFF Brown Packaging Parcel Tape 50mm x 66m - Pack of
2"><h2 class="a-size-medium a-color-null s-inline s-access-title
a-text-normal">3M Scotch BUFF Brown Packaging Parcel Tape 50mm x 66m -
Pack of 2</h2></a>
Now what I would like is to be able to just get
"http://www.amazon.co.uk/Scotch-BUFF-Brown-Packaging-Parcel/dp/B001OYOI5E"
, however i'm not sure how to target that specific data. Does anyone know how to do this?, I would really appreciate it, thank you.
Although similar to the other post, this is different and is not as complex, I think the solution in the other problem could work, but would require rewriting of code.
Instead of printing the entire anchor-element, you only need the value of the href-attribute. You can access this attribute as following:
for item in g_data:
print item.contents[1].find_all("a", {"class": "a-link-normal"})[1]['href']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.