简体   繁体   中英

Scraping Table using Python and Selenium

I am trying to scrape the table below using python. Tried pulling html tags to find the element id_dt1_NGY00 and so on but cannot find it once the page is populated so someone told me use Selenium and did manage to scrape some data.

https://www.insidefutures.com/markets/data.php?page=quote&sym=ng&x=13&y=8

The numbers are updated every 10 minutes so this website is dynamic. Used the following code below but it is printing out everything in a linear format rather than in a format that can be tabular as rows and columns. Included below are two sections of sample output

Contract
Last  
Change
Open
High  
Low
Volume
Prev. Stl.
Time
Links

May '21 (NGK21)

2.550s
+0.006
2.550
2.550
2.550
1
2.544
05/21/18
Q / C / O

Jun '21 (NGM21)

2.576s
+0.006
0.000
2.576
2.576
0
2.570
05/21/18
Q / C / O

Code below import time from bs4 import BeautifulSoup from selenium import webdriver import pandas as pd

browser = webdriver.Chrome(executable_path= "C:\\Users\\siddk\\PycharmProjects\\WebSraping\\venv\\selenium\\webdriver\\chromedriver.exe")

browser.get(" https://www.insidefutures.com/markets/data.php?page=quote&sym=ng&x=14&y=16 ")

html = browser.page_source soup = BeautifulSoup(html, 'html.parser')

th_tags = soup.find_all('tr') for th in th_tags: print (th.get_text())

I want to extract this data in Panda and analyze averages etc on daily basis. Please help. I have exhausted my strength on doing this myself with multiple iterations to code.

Try the below script to get the tabular data. It is necessary to find the right url which contains the same table but does not get generated dynamically so that you can do your operation without using any browser simulator.

Give it a go:

from bs4 import BeautifulSoup
import requests

url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for tr in soup.find(class_="bcQuoteTable").find_all("tr"):
    data = [item.get_text(strip=True) for item in tr.find_all(["th","td"])]
    print(data)

Rusults are like:

['Contract', 'Last', 'Change', 'Open', 'High', 'Low', 'Volume', 'Prev. Stl.', 'Time', 'Links']
['Cash (NGY00)', '2.770s', '+0.010', '0.000', '2.770', '2.770', '0', '2.760', '05/21/18', 'Q/C/O']
["Jun \\'18 (NGM18)", '2.901', '-0.007', '2.902', '2.903', '2.899', '138', '2.908', '17:11', 'Q/C/O']
["Jul \\'18 (NGN18)", '2.927', '-0.009', '2.928', '2.930', '2.926', '91', '2.936', '17:11', 'Q/C/O']
["Aug \\'18 (NGQ18)", '2.944', '-0.008', '2.945', '2.947', '2.944', '42', '2.952', '17:10', 'Q/C/O']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM