Beautiful Soup：沒有抓取正確的信息

Question

我正在用美麗的湯來刮粗體花名及其對應的圖片鏈接： http : //www.all-my-favourite-flower-names.com/list-of-flower-names.html

我不僅要為以“A”開頭的花執行此操作，還要為您可以嘗試獲得的所有其他花（以“B”、“C”、“D”等開頭的花）進行刮刀工作。）。

我能夠為一些“A”花拼湊一些東西......

for flower in soup.find_all('b'):  #Finds flower names and appends them to the flowers list
        flower = flower.string
        if (flower != None and flower[0] == "A"):
            flowers.append(flower.strip('.()'))
        
    for link in soup.find_all('img'):  #Finds 'src' in <img> tag and appends 'src' to the links list
        links.append(link['src'].strip('https://'))

    for stragler in soup.find_all('a'):  #Finds the only flower name that doesn't follow the pattern of the other names and inserts it into flowers list
        floss = stragler.string
        if floss != None and floss == "Ageratum houstonianum.":
            flowers.insert(3, floss)

這樣做的明顯問題是，當發生任何變化時，它肯定會崩潰。 有人可以幫我一把嗎？

Answer 1

問題似乎是花已經跨頁分頁了。 像這樣的東西應該可以幫助您循環瀏覽不同的頁面。 代碼未測試

import urllib2
test = {'A':'', 'B':'-B', 'XYZ': '-X-Y-Z'}
flower_list = []
for key, value in test.items():
     page = urllib2.urlopen('http://www.all-my-favourite-flower-names.com/list-of-flower-names{0}.html'.format(
value)).read()
     soup = BeautifulSoup(page)
     # Now do your logic or every page, and probably save the flower names in a list.

Beautiful Soup：沒有抓取正確的信息

問題描述

1 個解決方案

解決方案1
1 已采納 2015-12-11 01:28:12

Beautiful Soup：沒有抓取正確的信息

問題描述

1 個解決方案

解決方案1 1 已采納 2015-12-11 01:28:12

解決方案1
1 已采納 2015-12-11 01:28:12