[英]Beautiful Soup: not grabbing correct information
我正在用美丽的汤来刮粗体花名及其对应的图片链接: http : //www.all-my-favourite-flower-names.com/list-of-flower-names.html
我不仅要为以“A”开头的花执行此操作,还要为您可以尝试获得的所有其他花(以“B”、“C”、“D”等开头的花)进行刮刀工作。 )。
我能够为一些“A”花拼凑一些东西......
for flower in soup.find_all('b'): #Finds flower names and appends them to the flowers list
flower = flower.string
if (flower != None and flower[0] == "A"):
flowers.append(flower.strip('.()'))
for link in soup.find_all('img'): #Finds 'src' in <img> tag and appends 'src' to the links list
links.append(link['src'].strip('https://'))
for stragler in soup.find_all('a'): #Finds the only flower name that doesn't follow the pattern of the other names and inserts it into flowers list
floss = stragler.string
if floss != None and floss == "Ageratum houstonianum.":
flowers.insert(3, floss)
这样做的明显问题是,当发生任何变化时,它肯定会崩溃。 有人可以帮我一把吗?
问题似乎是花已经跨页分页了。 像这样的东西应该可以帮助您循环浏览不同的页面。 代码未测试
import urllib2
test = {'A':'', 'B':'-B', 'XYZ': '-X-Y-Z'}
flower_list = []
for key, value in test.items():
page = urllib2.urlopen('http://www.all-my-favourite-flower-names.com/list-of-flower-names{0}.html'.format(
value)).read()
soup = BeautifulSoup(page)
# Now do your logic or every page, and probably save the flower names in a list.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.