I am trying to scrape the table below using python. Tried pulling html tags to find the element id_dt1_NGY00 and so on but cannot find it once the page is populated so someone told me use Selenium and did manage to scrape some data.
https://www.insidefutures.com/markets/data.php?page=quote&sym=ng&x=13&y=8
The numbers are updated every 10 minutes so this website is dynamic. Used the following code below but it is printing out everything in a linear format rather than in a format that can be tabular as rows and columns. Included below are two sections of sample output
Contract
Last
Change
Open
High
Low
Volume
Prev. Stl.
Time
Links
May '21 (NGK21)
2.550s
+0.006
2.550
2.550
2.550
1
2.544
05/21/18
Q / C / O
Jun '21 (NGM21)
2.576s
+0.006
0.000
2.576
2.576
0
2.570
05/21/18
Q / C / O
Code below import time from bs4 import BeautifulSoup from selenium import webdriver import pandas as pd
browser = webdriver.Chrome(executable_path= "C:\\Users\\siddk\\PycharmProjects\\WebSraping\\venv\\selenium\\webdriver\\chromedriver.exe")
browser.get(" https://www.insidefutures.com/markets/data.php?page=quote&sym=ng&x=14&y=16 ")
html = browser.page_source soup = BeautifulSoup(html, 'html.parser')
th_tags = soup.find_all('tr') for th in th_tags: print (th.get_text())
I want to extract this data in Panda and analyze averages etc on daily basis. Please help. I have exhausted my strength on doing this myself with multiple iterations to code.
Try the below script to get the tabular data. It is necessary to find the right url which contains the same table but does not get generated dynamically so that you can do your operation without using any browser simulator.
Give it a go:
from bs4 import BeautifulSoup
import requests
url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for tr in soup.find(class_="bcQuoteTable").find_all("tr"):
data = [item.get_text(strip=True) for item in tr.find_all(["th","td"])]
print(data)
Rusults are like:
['Contract', 'Last', 'Change', 'Open', 'High', 'Low', 'Volume', 'Prev. Stl.', 'Time', 'Links']
['Cash (NGY00)', '2.770s', '+0.010', '0.000', '2.770', '2.770', '0', '2.760', '05/21/18', 'Q/C/O']
["Jun \\'18 (NGM18)", '2.901', '-0.007', '2.902', '2.903', '2.899', '138', '2.908', '17:11', 'Q/C/O']
["Jul \\'18 (NGN18)", '2.927', '-0.009', '2.928', '2.930', '2.926', '91', '2.936', '17:11', 'Q/C/O']
["Aug \\'18 (NGQ18)", '2.944', '-0.008', '2.945', '2.947', '2.944', '42', '2.952', '17:10', 'Q/C/O']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.