简体   繁体   中英

how do you scrape tables with the 'ngcontent' format using selenium/python?

Basic tables are fairly easy to scrape with Selenium. I am having trouble scraping tables with "_ngcontent" notations ("https://material.angular.io/components/table/overview"). I am trying to scrape it into a dataframe.

在此处输入图像描述

This is how far I got:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

PATH = "C:\chromedriver.exe"

driver = webdriver.Chrome(PATH)

URL = 'https://material.angular.io/components/table/overview'

driver.get(URL)

titles = driver.find_element(By.CSS_SELECTOR, '#table-basic > div > div.docs-example-viewer-body.ng-star-inserted > table-basic-example > table > thead')
print(titles.text)

I was only able to get an element with: 'No. Name Weight Symbol' But I am not able to iterate through it, and scrape the data.

Please assist

To grab the table data easily, you can use selenium with pandas

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
table=driver.get('https://material.angular.io/components/table/overview')
driver.maximize_window()
table = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '(//table)[1]'))).get_attribute("outerHTML")
df = pd.read_html(table)[0]
print(df)

Output:

     No.    Name     Weight  Symbol
0    1   Hydrogen   1.0079      H
1    2     Helium   4.0026     He
2    3    Lithium   6.9410     Li
3    4  Beryllium   9.0122     Be
4    5      Boron  10.8110      B
5    6     Carbon  12.0107      C
6    7   Nitrogen  14.0067      N
7    8     Oxygen  15.9994      O
8    9   Fluorine  18.9984      F
9   10       Neon  20.1797     Ne

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM