Python問題：TypeError：無法散列的類型：網絡抓取過程中的“切片”

Question

我正在嘗試從網站上抓取一些信息。 我能夠成功地抓取我要查找的文本，但是當我嘗試創建一個將文本附加在一起的函數時，出現了不可哈希類型的TypeError。

您知道這里可能會發生什么嗎？ 有人知道如何解決此問題嗎？

這是有問題的代碼：

records = []
for result in results:
    name = result.contents[0][0:-1]

這里是完整的代碼，用於復制目的：

import requests
from bs4 import BeautifulSoup

r = requests.get('https://skinsalvationsf.com/2012/08/updated-comedogenic-ingredients-list/')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('td', attrs={'valign':'top'})

records = []
for result in results:
    name = result.contents[0][0:-1]

results項目樣本：

<td valign="top" width="33%">Acetylated Lanolin <sup>5</sup></td>,
<td valign="top" width="33%">Coconut Butter<sup> 8</sup></td>,
...
<td valign="top" width="33%"><sup> </sup></td>

提前致謝！！

Answer 1

在某些收集的結果中， contents包含任何文本，而僅包含Tag對象，因此當您嘗試從Tag的屬性字典中選擇一個切片時，您會遇到TypeError 。

您可以使用try-except塊捕獲此類錯誤，

for result in results:
    try:
        name = result.contents[0][0:-1]
    except TypeError:
        continue

或者，您可以使用.strings僅選擇NavigableString內容，

for result in results:
    name = list(result.strings)[0][0:-1]

但這似乎只是最后一個沒有文本內容的項目，因此您可以忽略它。

results = soup.find_all('td', attrs={'valign':'top'})[:-1]

for result in results:
    name = result.contents[0][:-1]

Answer 2

要了解為什么會收到TypeError: unhashable type: 'slice'閱讀tmadam的answer 。 簡而言之，在最后一次迭代中， result變量指向bs4.element.Tag對象而不是bs4.element.NavigableString 。

下面是使用try-except塊的有效解決方案，因為列表中的最后2個<td>元素不包含“ stripped_strings”，並且會產生ValueError: not enough values to unpack (expected 2, got 0) 。

代碼：（如果要使用f-strings則為Python 3.6+）

from bs4 import BeautifulSoup
import requests

url = 'https://skinsalvationsf.com/2012/08/updated-comedogenic-ingredients-list/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'}
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(r.text, 'html.parser')

tds = soup.find_all('td')
for td in tds:
    try:
        ingredient, rating = td.stripped_strings
    except ValueError:
        pass
    else:
        print(f'{ingredient} -> {rating}')

輸出：

Acetylated Lanolin -> 5
Coconut Butter -> 8
...
Xylene -> 7
Octyl Palmitate -> 7

您還可以刪除整個try-except-else並省略最后2個<td> ：

tds = soup.find_all('td')[:-2]
for td in tds:
    ingredient, rating = td.stripped_strings
    ...

但是，網站的維護者可能決定添加或刪除某些成分，從而導致代碼丟失某些成分。

Python問題：TypeError：無法散列的類型：網絡抓取過程中的“切片”

問題描述

2 個解決方案

解決方案1
2 已采納 2018-05-03 05:50:10

解決方案2
1 2018-05-03 09:08:57

Python問題：TypeError：無法散列的類型：網絡抓取過程中的“切片”

問題描述

2 個解決方案

解決方案1 2 已采納 2018-05-03 05:50:10

解決方案2 1 2018-05-03 09:08:57

解決方案1
2 已采納 2018-05-03 05:50:10

解決方案2
1 2018-05-03 09:08:57