[英]Web Scraping through Python BeautifulSoup
我只是Python的初學者。
我正在嘗試從網站上抓取數據,並設法編寫了以下代碼。
但是,由於無法獲取href
標簽,因此我不確定如何繼續進行操作,因此無法轉到每個列表並獲取數據。 我也不太了解HTML標記,因此我懷疑我沒有正確識別標記。
這是我的代碼:
import requests
from bs4 import BeautifulSoup
urls = []
for i in range(1,5):
pages = "https://directory.singaporefintech.org/?p={0}&category=0&zoom=15&is_mile=0&directory_radius=0&view=list&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&featured_only=0&feature=1&perpage=20&sort=random".format(i)
urls.append(pages)
Data = []
for info in urls:
page = requests.get(info)
soup = BeautifulSoup(page.content, 'html.parser')
links = soup.find_all('a', attrs ={'class' :'sabai-directory-title'})
hrefs = [link['href'] for link in links]
上面的代碼將hrefs生成為空白列表。 任何幫助將不勝感激!
謝謝!!!
代碼很好,您要查找的類僅在那些頁面上不存在。 例如,檢查https://directory.singaporefintech.org/hello-world/?category=0&zoom=15&is_mile=0&directory_radius=0&view=list&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager后,用評論-回復鏈接替換sabai-directory-title類= 0&featured_only = 0&feature = 1&perpage = 20&sort = random並在我添加打印語句時得到了結果
嗨,我對代碼做了一些更改:
import requests
from bs4 import BeautifulSoup
from pprint import pprint
urls = []
for i in range(1,5):
pages = "https://directory.singaporefintech.org"
urls.append(pages)
Data = []
hrefs = []
for info in urls:
page = requests.get(info)
soup = BeautifulSoup(page.content, 'html.parser')
links = soup.find_all('div', attrs ={'class' :'sabai-directory-title'})
for link in links:
Data.extend([a['href'].encode('ascii') for a in link.find_all('a', href=True) if a.text])
pprint (Data)
輸出:
['https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/moolahsense',
'https://directory.singaporefintech.org/directory/listing/myfinb',
'https://directory.singaporefintech.org/directory/listing/wefinance',
'https://directory.singaporefintech.org/directory/listing/quber',
'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/acekards',
'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/fundmylife',
'https://directory.singaporefintech.org/directory/listing/mooments',
'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/junotele_',
'https://directory.singaporefintech.org/directory/listing/mobilecover',
'https://directory.singaporefintech.org/directory/listing/cherrypay',
'https://directory.singaporefintech.org/directory/listing/toast',
'https://directory.singaporefintech.org/directory/listing/cashdab',
'https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/moolahsense',
'https://directory.singaporefintech.org/directory/listing/myfinb',
'https://directory.singaporefintech.org/directory/listing/wefinance',
'https://directory.singaporefintech.org/directory/listing/quber',
'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/acekards',
'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/fundmylife',
'https://directory.singaporefintech.org/directory/listing/mooments',
'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/junotele_',
'https://directory.singaporefintech.org/directory/listing/mobilecover',
'https://directory.singaporefintech.org/directory/listing/cherrypay',
'https://directory.singaporefintech.org/directory/listing/toast',
'https://directory.singaporefintech.org/directory/listing/cashdab',
'https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/moolahsense',
'https://directory.singaporefintech.org/directory/listing/myfinb',
'https://directory.singaporefintech.org/directory/listing/wefinance',
'https://directory.singaporefintech.org/directory/listing/quber',
'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/acekards',
'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/fundmylife',
'https://directory.singaporefintech.org/directory/listing/mooments',
'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/junotele_',
'https://directory.singaporefintech.org/directory/listing/mobilecover',
'https://directory.singaporefintech.org/directory/listing/cherrypay',
'https://directory.singaporefintech.org/directory/listing/toast',
'https://directory.singaporefintech.org/directory/listing/cashdab',
'https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/moolahsense',
'https://directory.singaporefintech.org/directory/listing/myfinb',
'https://directory.singaporefintech.org/directory/listing/wefinance',
'https://directory.singaporefintech.org/directory/listing/quber',
'https://directory.singaporefintech.org/directory/listing/ayondo-asia-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/acekards',
'https://directory.singaporefintech.org/directory/listing/paper-ink-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/alpha-payments-cloud',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/corris-asset-management-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/fundmylife',
'https://directory.singaporefintech.org/directory/listing/mooments',
'https://directory.singaporefintech.org/directory/listing/venture-capital-network-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/junotele_',
'https://directory.singaporefintech.org/directory/listing/mobilecover',
'https://directory.singaporefintech.org/directory/listing/cherrypay',
'https://directory.singaporefintech.org/directory/listing/toast',
'https://directory.singaporefintech.org/directory/listing/cashdab']
這是您期望的數據輸出嗎?
希望能幫助到你!!
您可以使用CSS選擇器來剪貼鏈接。 選擇器div.sabai-directory-title a
將在<div>
標記內找到帶有sabai-directory-title
類的任何<a>
標記(我更新了URL,您給了我錯誤頁面):
from bs4 import BeautifulSoup
import requests
from pprint import pprint
r = requests.get('https://directory.singaporefintech.org/')
soup = BeautifulSoup(r.text, 'lxml')
hrefs = [a['href'] for a in soup.select('div.sabai-directory-title a')]
pprint(hrefs)
這將打印:
['https://directory.singaporefintech.org/directory/listing/silent-eight',
'https://directory.singaporefintech.org/directory/listing/incomlend',
'https://directory.singaporefintech.org/directory/listing/bizgrow',
'https://directory.singaporefintech.org/directory/listing/makerscut',
'https://directory.singaporefintech.org/directory/listing/soho-fintech',
'https://directory.singaporefintech.org/directory/listing/dxmarkets',
'https://directory.singaporefintech.org/directory/listing/fundrevo',
'https://directory.singaporefintech.org/directory/listing/money4money',
'https://directory.singaporefintech.org/directory/listing/onelyst',
'https://directory.singaporefintech.org/directory/listing/hearti-lab',
'https://directory.singaporefintech.org/directory/listing/samurai-fintech-singapore-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/ceo-1',
'https://directory.singaporefintech.org/directory/listing/arcadier',
'https://directory.singaporefintech.org/directory/listing/plmp-fintech-pte-ltd',
'https://directory.singaporefintech.org/directory/listing/cash-in-asia',
'https://directory.singaporefintech.org/directory/listing/grc-systems',
'https://directory.singaporefintech.org/directory/listing/sendexpense',
'https://directory.singaporefintech.org/directory/listing/jinjerjade',
'https://directory.singaporefintech.org/directory/listing/hatcher',
'https://directory.singaporefintech.org/directory/listing/fintech-consortium']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.