![](/img/trans.png)
[英]BeautifulSoup Python NoneType object has no attribute 'text'
[英]'NoneType' object has no attribute 'text' BeautifulSoup Python
我正在尝试从下面的 URL 抓取数据,但我不断收到AttributeError: 'NoneType' object has no attribute 'text'
我如何抓取网站以便循环遍历每个 td 并获取双语文本?
这是我到目前为止所拥有的
from bs4 import BeautifulSoup
import requests
url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"
html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr") # contains # rows
# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
# remove any newlines and extra spaces from left and right
headings.append(td.b.text.replace('\n', ' ').strip())
print(headings)
您将 td.b 保留在 for 循环中,这会产生错误,因为在表中没有任何内容带有 'b' 作为属性。 通过删除它,您可以获得 output。
from bs4 import BeautifulSoup
import requests
url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"
html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr") # contains # rows
# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
# remove any newlines and extra spaces from left and right
headings.append(td.text.replace('\n', ' ').strip())
print(headings)
这是我得到的 output
['No.', 'Mongolian text', 'Loosely translated into English']
在这一行headings.append(td.b.text.replace('\n', ' ').strip())
中,因为表的列中没有b
属性,所以程序会抛出错误。
此外,您不需要单独解析strong
文本,而是使用td.text
。
from bs4 import BeautifulSoup
import requests
url="http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars/"
html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
gdp_table = soup.find("table", attrs={"class": "table-translations"})
gdp_table_data = gdp_table.tbody.find_all("tr") # contains # rows
print(gdp_table_data[0].find_all("td"))
# Get all the headings of Lists
headings = []
for td in gdp_table_data[0].find_all("td"):
# remove any newlines and extra spaces from left and right and append to headings
headings.append(td.get_text(strip=True))
print(headings)
# output ['No.', 'Mongolian text', 'Loosely translated into English']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.