無法從html分別提取數字和文本

Question

從下面的html代碼中，我想分別獲取數字和文本，我能夠獲取數字，但是對於文本，它給出了錯誤，如下所示。 （注意：它在for loop ，如果未找到索引給出的錯誤，則由於split(b'.')[1]是匹配的，因此只有很少的鏈接）。

錯誤：

Traceback (most recent call last):
  File "C:/Users/Computers Zone/Google Drive/Python/SANDWICHTRY.py", line 49, in <module>
    sandwich=soup.find('h1',{'class':'headline'}).encode_contents().strip().split(b'.')[1].decode("utf-8")
IndexError: list index out of range

HTML代碼：

<h1 class="headline ">1. Old Oak Tap BLT</h1>

Ny代碼：

soup=BeautifulSoup(pages,'lxml').find('div',{'id':'page'})
rank=soup.find('h1',{'class':'headline'}).encode_contents().strip().split(b'.')[0].decode("utf-8")
print (rank)
sandwich=soup.find('h1',{'class':'headline'}).encode_contents().strip().split(b'.')[1].decode("utf-8")
print(sandwich)

Answer 1

沒有時發生此錯誤. 在標題字符串中，即第二個元素不存在。

要解決此問題，請獲得結果，分割字符串，但不要假定總是有兩個元素：

from bs4 import BeautifulSoup

pages = '<h1 class="headline">1. Old Oak Tap BLT</h1>'

soup = BeautifulSoup(pages, 'lxml')
titles = soup.find('h1', {'class': 'headline'}).encode_contents().split(b'.')

for text in titles:  # go through all existing list elements
    print(text.decode("utf-8").strip())

或在閱讀元素之前檢查列表中的2個元素，例如：

if len(titles) == 2:
    rank = titles[0].decode("utf-8").strip()
    sandwich = titles[1].decode("utf-8").strip()

無法從html分別提取數字和文本

問題描述

1 個解決方案

解決方案1
1 已采納 2017-11-18 12:34:45

無法從html分別提取數字和文本

問題描述

1 個解決方案

解決方案1 1 已采納 2017-11-18 12:34:45

解決方案1
1 已采納 2017-11-18 12:34:45