I would like to get all the values from the table "Elektriciteit NL" on https://powerhouse.net/forecast-prijzen-onbalans/ . However after endlessly trying to find the right xpath using selenium I was not able to scrape the table.
I tried to use "inspect" and copy the xpath from the table to identify the length of the table for scraping later. After this failed I tried to use "contain" however this was not succesfull either. Afterwards i tried some things using BeautifullSoup however without any luck.
#%%
import pandas as pd
from selenium import webdriver
import pandas as pd
#%% powerhouse Elektriciteit NL base & peak
url = "https://powerhouse.net/forecast-prijzen-onbalans/"
#%% open webpagina
driver = webdriver.Chrome(executable_path = path + 'chromedriver.exe')
driver.get(url)
#%%
prices = []
#loop for values in table
for j in range(len(driver.find_elements_by_xpath('//tr[@id="endex_nl_forecast"]/div[3]/table/tbody/tr[1]/td[4]'))):
base = driver.find_elements_by_xpath('//tr[@id="endex_nl_forecast"]/div[3]/table/tbody/tr[1]/td[4]')[j]
#%%
#trying with BeautifulSoup
from bs4 import BeautifulSoup
import requests
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
table = soup.find('table', id = 'endex_nl_forecast')
rows = soup.find_all('tr')
I would like to have the table in a dataframe and understand how xpath exactly works. I'm kind of new to the whole concept.
If you are open to ways other than xpath you could do this without selenium or xpath:
you could just use pandas
import pandas as pd
table = pd.read_html('https://powerhouse.net/forecast-prijzen-onbalans/')[4]
If you want text representation of icons you could extract the class name of the svg
which describes arrow direction from the appropriate td
s.
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
r = requests.get('https://powerhouse.net/forecast-prijzen-onbalans/')
soup = bs(r.content, 'lxml')
table = soup.select_one('#endex_nl_forecast table')
rows = []
headers = [i.text for i in table.select('th')]
for tr in table.select('tr')[1:]:
rows.append([i.text if i.svg is None else i.svg['class'][2].split('-')[-1] for i in tr.select('td') ])
df = pd.DataFrame(rows, columns = headers)
print(df)
Sample rows:
You can use Selenium driver to locate the table & its contents,
url = 'https://powerhouse.net/forecast-prijzen-onbalans/'
driver.get(url)
time.sleep(3)
To Read Table Headers & Print
tableHeader = driver.find_elements_by_xpath("//*[@id='endex_nl_forecast']//thead//th")
print(tableHeader)
for header in tableHeader:
print(header.text)
To Find number of rows in the table
rowElements = driver.find_elements_by_xpath("//*[@id='endex_nl_forecast']//tbody/tr")
print('Total rows in the table:', len(rowElements))
To print each rows as is
for row in rowElements:
print(row.text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.