繁体   English   中英

'NoneType' object 没有属性 'text' BeautifulSoup Python

[英]'NoneType' object has no attribute 'text' BeautifulSoup Python

我正在尝试从下面的 URL 抓取数据,但我不断收到AttributeError: 'NoneType' object has no attribute 'text'

我如何抓取网站以便循环遍历每个 td 并获取双语文本?

这是我到目前为止所拥有的

from bs4 import BeautifulSoup
import requests

url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"

html_content = requests.get(url).text

# Parse the html content
soup = BeautifulSoup(html_content, "lxml")

gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr")  # contains # rows

# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
    # remove any newlines and extra spaces from left and right
    headings.append(td.b.text.replace('\n', ' ').strip())

print(headings)

您将 td.b 保留在 for 循环中,这会产生错误,因为在表中没有任何内容带有 'b' 作为属性。 通过删除它,您可以获得 output。

from bs4 import BeautifulSoup
import requests

url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"

html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")

gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr")  # contains # rows

# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
    # remove any newlines and extra spaces from left and right
    headings.append(td.text.replace('\n', ' ').strip())

print(headings)

这是我得到的 output

['No.', 'Mongolian text', 'Loosely translated into English']

在这一行headings.append(td.b.text.replace('\n', ' ').strip())中,因为表的列中没有b属性,所以程序会抛出错误。

此外,您不需要单独解析strong文本,而是使用td.text

from bs4 import BeautifulSoup
import requests

url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"

html_content = requests.get(url).text

# Parse the html content
soup = BeautifulSoup(html_content, "lxml")

gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr")  # contains # rows
print(gdp_table_data[0].find_all("td"))
# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
    # remove any newlines and extra spaces from left and right and append to headings
    headings.append(td.get_text(strip=True))
print(headings)
# output ['No.', 'Mongolian text', 'Loosely translated into English']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM