This is the code I am using to iterate over all elements:
soup_top = bs4.BeautifulSoup(r_top.text, 'html.parser')
selector = '#ContentPlaceHolder1_gvDisplay table tr td:nth-of-type(3) a'
for link in soup_top.select(selector):
print(link)
The same selector gives a length of 57 when used in JavaScript:
document.querySelectorAll("#ContentPlaceHolder1_gvDisplay table tr td:nth-of-type(3) a").length;
I thought that maybe I am not getting the contents of the webpage correctly. I then saved a local copy of the webpage but the selector in Beautiful Soup still did not select anything. What is going on here?
This is the website I am using the code on.
It seems that this is due to the parser you used (ie html.parser
). If I try the same thing with lxml
as parser:
from bs4 import BeautifulSoup
import requests
url = 'http://www.swapnilpatni.com/law_charts_final.php'
r = requests.get(url)
r.raise_for_status()
soup = BeautifulSoup(r.text, 'lxml')
css_select = '#ContentPlaceHolder1_gvDisplay table tr td:nth-of-type(3) a'
links = soup.select(css_select)
print('{} link(s) found'.format(len(links)))
>> 1 link(s) found
for link in links:
print(link['href'])
>> spadmin/doc/Company Law amendment 1.1.png
The html.parser
will return a result up until #ContentPlaceHolder1_gvDisplay table tr
, and even then it only returns the first tr
.
When running the url through W3 Markup Validation Service , this is the error that is returned:
Sorry, I am unable to validate this document because on line 1212 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: utf8 "\\xA0" does not map to Unicode
It's likely that the html.parser
chokes on this as well, while lxml
is more fault-tolerant.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.