簡體   English   中英

通過Python BeautifulSoup進行網頁爬取

[英]Web Scraping through Python BeautifulSoup

我只是Python的初學者。

我正在嘗試從網站上抓取數據,並設法編寫了以下代碼。

但是,由於無法獲取href標簽,因此我不確定如何繼續進行操作,因此無法轉到每個列表並獲取數據。 我也不太了解HTML標記,因此我懷疑我沒有正確識別標記。

這是我的代碼:

import requests 
from bs4 import BeautifulSoup

urls = []
for i in range(1,5):
    pages = "https://directory.singaporefintech.org/?p={0}&category=0&zoom=15&is_mile=0&directory_radius=0&view=list&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&featured_only=0&feature=1&perpage=20&sort=random".format(i)
    urls.append(pages)

Data = []
for info in urls:
    page = requests.get(info)
    soup = BeautifulSoup(page.content, 'html.parser')
    links = soup.find_all('a', attrs ={'class' :'sabai-directory-title'})
    hrefs = [link['href'] for link in links]

上面的代碼將hrefs生成為空白列表。 任何幫助將不勝感激!

謝謝!!!

代碼很好,您要查找的類僅在那些頁面上不存在。 例如,檢查https://directory.singaporefintech.org/hello-world/?category=0&zoom=15&​​is_mile=0&directory_radius=0&view=list&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager后,用評論-回復鏈接替換sabai-directory-title類= 0&featured_only = 0&feature = 1&perpage = 20&sort = random並在我添加打印語句時得到了結果

嗨,我對代碼做了一些更改:

import requests
from bs4 import BeautifulSoup
from pprint import pprint

urls = []
for i in range(1,5):
    pages = "https://directory.singaporefintech.org"
    urls.append(pages)

Data = []
hrefs = []
for info in urls:
    page = requests.get(info)
    soup = BeautifulSoup(page.content, 'html.parser')
    links = soup.find_all('div', attrs ={'class' :'sabai-directory-title'})
    for link in links:
        Data.extend([a['href'].encode('ascii') for a in link.find_all('a', href=True) if a.text])
pprint (Data)

輸出:

     ['https://directory.singaporefintech.org/directory/listing/silent-eight',
     'https://directory.singaporefintech.org/directory/listing/moolahsense',
     'https://directory.singaporefintech.org/directory/listing/myfinb',
     'https://directory.singaporefintech.org/directory/listing/wefinance',
     'https://directory.singaporefintech.org/directory/listing/quber',
     'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/ceo-1',
     'https://directory.singaporefintech.org/directory/listing/acekards',
     'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
     'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/fundmylife',
     'https://directory.singaporefintech.org/directory/listing/mooments',
     'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/junotele_',
     'https://directory.singaporefintech.org/directory/listing/mobilecover',
     'https://directory.singaporefintech.org/directory/listing/cherrypay',
     'https://directory.singaporefintech.org/directory/listing/toast',
     'https://directory.singaporefintech.org/directory/listing/cashdab',
     'https://directory.singaporefintech.org/directory/listing/silent-eight',
     'https://directory.singaporefintech.org/directory/listing/moolahsense',
     'https://directory.singaporefintech.org/directory/listing/myfinb',
     'https://directory.singaporefintech.org/directory/listing/wefinance',
     'https://directory.singaporefintech.org/directory/listing/quber',
     'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/ceo-1',
     'https://directory.singaporefintech.org/directory/listing/acekards',
     'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
     'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/fundmylife',
     'https://directory.singaporefintech.org/directory/listing/mooments',
     'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/junotele_',
     'https://directory.singaporefintech.org/directory/listing/mobilecover',
     'https://directory.singaporefintech.org/directory/listing/cherrypay',
     'https://directory.singaporefintech.org/directory/listing/toast',
     'https://directory.singaporefintech.org/directory/listing/cashdab',
     'https://directory.singaporefintech.org/directory/listing/silent-eight',
     'https://directory.singaporefintech.org/directory/listing/moolahsense',
     'https://directory.singaporefintech.org/directory/listing/myfinb',
     'https://directory.singaporefintech.org/directory/listing/wefinance',
     'https://directory.singaporefintech.org/directory/listing/quber',
     'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/ceo-1',
     'https://directory.singaporefintech.org/directory/listing/acekards',
     'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
     'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/fundmylife',
     'https://directory.singaporefintech.org/directory/listing/mooments',
     'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/junotele_',
     'https://directory.singaporefintech.org/directory/listing/mobilecover',
     'https://directory.singaporefintech.org/directory/listing/cherrypay',
     'https://directory.singaporefintech.org/directory/listing/toast',
     'https://directory.singaporefintech.org/directory/listing/cashdab',
     'https://directory.singaporefintech.org/directory/listing/silent-eight',
     'https://directory.singaporefintech.org/directory/listing/moolahsense',
     'https://directory.singaporefintech.org/directory/listing/myfinb',
     'https://directory.singaporefintech.org/directory/listing/wefinance',
     'https://directory.singaporefintech.org/directory/listing/quber',
     'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/ceo-1',
     'https://directory.singaporefintech.org/directory/listing/acekards',
     'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
     'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/fundmylife',
     'https://directory.singaporefintech.org/directory/listing/mooments',
     'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
     'https://directory.singaporefintech.org/directory/listing/junotele_',
     'https://directory.singaporefintech.org/directory/listing/mobilecover',
     'https://directory.singaporefintech.org/directory/listing/cherrypay',
     'https://directory.singaporefintech.org/directory/listing/toast',
     'https://directory.singaporefintech.org/directory/listing/cashdab']

這是您期望的數據輸出嗎?

希望能幫助到你!!

您可以使用CSS選擇器來剪貼鏈接。 選擇器div.sabai-directory-title a將在<div>標記內找到帶有sabai-directory-title類的任何<a>標記(我更新了URL,您給了我錯誤頁面):

from bs4 import BeautifulSoup
import requests
from pprint import pprint

r = requests.get('https://directory.singaporefintech.org/')
soup = BeautifulSoup(r.text, 'lxml')

hrefs = [a['href'] for a in soup.select('div.sabai-directory-title a')]

pprint(hrefs)

這將打印:

['https://directory.singaporefintech.org/directory/listing/silent-eight',
 'https://directory.singaporefintech.org/directory/listing/incomlend',
 'https://directory.singaporefintech.org/directory/listing/bizgrow',
 'https://directory.singaporefintech.org/directory/listing/makerscut',
 'https://directory.singaporefintech.org/directory/listing/soho-fintech',
 'https://directory.singaporefintech.org/directory/listing/dxmarkets',
 'https://directory.singaporefintech.org/directory/listing/fundrevo',
 'https://directory.singaporefintech.org/directory/listing/money4money',
 'https://directory.singaporefintech.org/directory/listing/onelyst',
 'https://directory.singaporefintech.org/directory/listing/hearti-lab',
 'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
 'https://directory.singaporefintech.org/directory/listing/ceo-1',
 'https://directory.singaporefintech.org/directory/listing/arcadier',
 'https://directory.singaporefintech.org/directory/listing/plmp-fintech-pte-ltd',
 'https://directory.singaporefintech.org/directory/listing/cash-in-asia',
 'https://directory.singaporefintech.org/directory/listing/grc-systems',
 'https://directory.singaporefintech.org/directory/listing/sendexpense',
 'https://directory.singaporefintech.org/directory/listing/jinjerjade',
 'https://directory.singaporefintech.org/directory/listing/hatcher',
 'https://directory.singaporefintech.org/directory/listing/fintech-consortium']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM