Python HTML Web抓取

Question

我正在嘗試編寫一個python程序來解析以下頁面，並提取給定卡bin＃的卡子品牌和品牌： https : //www.cardbinlist.com/search.html? bin =371793 。 以下代碼片段檢索卡類型。

page = requests.get('https://www.cardbinlist.com/search.html?bin=371793')
tree = html.fromstring(page.content)
print("card type: ", tree.xpath("//td//following::td[7]")[0].text)

但是，不確定如何使用給定的類似邏輯獲得品牌

<th>Brand (Financial Service)</th> 
<td><a href="/AMEX-bin-list.html" target="_blank">AMEX</a></td>

然后

tree.xpath("//td//following::td[5]")[0].text

不返回任何內容。

Answer 1

我建議您選擇BeautifulSoup ，因為CSS選擇器比xpaths更方便。

通過使用漂亮的湯，您問題的代碼將是，

import requests
from bs4 import BeautifulSoup    

page = requests.get('https://www.cardbinlist.com/search.html?bin=371793')
soup = BeautifulSoup(page.content, 'html.parser')
brand_parent = soup.find('th', string='Brand (Financial Service)') # selects <th> element which contains text 'Brand (Financial Service)'
brand = brand_parent.find_next_sibling('td').text # O/P AMEX

如果您想使用Xpath，

將xpath更改為//td//following::td[5]/a並嘗試。

閱讀以下答案以選擇刮取方法，

Xpath vs DOM vs BeautifulSoup vs lxml vs other哪個是解析網頁最快的方法？

在python中解析HTML-lxml或BeautifulSoup？ 哪種用途更適合哪種用途？

Python HTML Web抓取

問題描述

1 個解決方案

解決方案1
2 已采納 2018-08-14 02:06:51

Python HTML Web抓取

問題描述

1 個解決方案

解決方案1 2 已采納 2018-08-14 02:06:51

解決方案1
2 已采納 2018-08-14 02:06:51