如何使用Selenium解析网站上的表格内容？

Question

I'm trying to parse the tables present in sports website into list of dictionary to render into template, this is my first exposure to selenium, I tried to read selenium documentation and wrote this program 我正在尝试将体育网站中的表格解析为字典列表以呈现为模板，这是我第一次接触硒，我尝试阅读硒文档并编写了该程序

from bs4 import BeautifulSoup
import time
from selenium import webdriver

url = "http://www.espncricinfo.com/rankings/content/page/211270.html"
browser = webdriver.Chrome()

browser.get(url)
time.sleep(3)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")

print(len(soup.find_all("table")))
print(soup.find("table", {"class": "ratingstable"}))

browser.close()
browser.quit()

I'm getting value as 0 and none, How can I modify to get all the values of table and store it in a list of dictionary?, If you have any other questions feel free to ask. 我得到的值是0而没有，我如何修改以获取表的所有值并将其存储在字典列表中？，如果您有任何其他问题，请随时提出。

Answer 1

First of all, avoid using time.sleep() . 首先，避免使用time.sleep() 。 It is against all best practices. 这违反了所有最佳做法。 Use an Explicit Wait . 使用显式等待 。

If you inspect the table, you can see that it is location inside the <iframe> tag with name="testbat" . 如果检查该表，则可以看到它位于<iframe>标记内， name="testbat" 。 So, you'll have to switch to that frame in order to get the contents of the table. 因此，您必须切换到该框架才能获取表的内容。 It can be done like this: 可以这样完成：

browser.switch_to.default_content()
browser.switch_to.frame('testbat')

After switching the frame, use the Explicit Wait as mentioned above. 切换帧后，使用上述“显式等待”。

Complete code: 完整的代码：

from bs4 import BeautifulSoup
from selenium import webdriver

# Add the following imports to your program
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

url = "http://www.espncricinfo.com/rankings/content/page/211270.html"
browser = webdriver.Chrome()
browser.get(url)

browser.switch_to.default_content()
browser.switch_to.frame('testbat')

try:
    WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'ratingstable')))
except TimeoutException:
    pass  # Handle the time out exception

html = browser.find_element_by_class_name('ratingstable').get_attribute('innerHTML')
soup = BeautifulSoup(html, "lxml")

You can check whether you've got the table: 您可以检查是否有桌子：

>>> print('S.P.D. Smith' in html)
True

如何使用Selenium解析网站上的表格内容？

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-02-07 05:54:11

如何使用Selenium解析网站上的表格内容？

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-02-07 05:54:11

解决方案1
0 已采纳 2018-02-07 05:54:11