'NoneType' object 没有属性 'text' BeautifulSoup Python

Question

我正在尝试从下面的 URL 抓取数据，但我不断收到AttributeError: 'NoneType' object has no attribute 'text'

我如何抓取网站以便循环遍历每个 td 并获取双语文本？

这是我到目前为止所拥有的

from bs4 import BeautifulSoup
import requests

url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"

html_content = requests.get(url).text

# Parse the html content
soup = BeautifulSoup(html_content, "lxml")

gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr")  # contains # rows

# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
    # remove any newlines and extra spaces from left and right
    headings.append(td.b.text.replace('\n', ' ').strip())

print(headings)

Answer 1

您将 td.b 保留在 for 循环中，这会产生错误，因为在表中没有任何内容带有 'b' 作为属性。 通过删除它，您可以获得 output。

from bs4 import BeautifulSoup
import requests

url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"

html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")

gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr")  # contains # rows

# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
    # remove any newlines and extra spaces from left and right
    headings.append(td.text.replace('\n', ' ').strip())

print(headings)

这是我得到的 output

['No.', 'Mongolian text', 'Loosely translated into English']

Answer 2

在这一行headings.append(td.b.text.replace('\n', ' ').strip())中，因为表的列中没有b属性，所以程序会抛出错误。

此外，您不需要单独解析strong文本，而是使用td.text 。

from bs4 import BeautifulSoup
import requests

url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"

html_content = requests.get(url).text

# Parse the html content
soup = BeautifulSoup(html_content, "lxml")

gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr")  # contains # rows
print(gdp_table_data[0].find_all("td"))
# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
    # remove any newlines and extra spaces from left and right and append to headings
    headings.append(td.get_text(strip=True))
print(headings)
# output ['No.', 'Mongolian text', 'Loosely translated into English']

'NoneType' object 没有属性 'text' BeautifulSoup Python

问题描述

2 个解决方案

解决方案1
0 2020-08-03 03:00:21

解决方案2
0 2020-08-03 06:58:02

'NoneType' object 没有属性 'text' BeautifulSoup Python

问题描述

2 个解决方案

解决方案1 0 2020-08-03 03:00:21

解决方案2 0 2020-08-03 06:58:02

解决方案1
0 2020-08-03 03:00:21

解决方案2
0 2020-08-03 06:58:02