How can I scrape table from " https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17 "
Then find out maximum "OI" under "PUTS" and finally have corresponding entries in that row for that particular maximum OI
Reached till printing rows:
import urllib2
from urllib2 import urlopen
import bs4 as bs
url = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17'
html = urllib2.urlopen(url).read()
soup = bs.BeautifulSoup(html,'lxml')
table = soup.find('div',id='octable')
rows = table.find_all('tr')
for row in rows:
print row.text
You have to iterate all the <td>
inside the <tr>
. You can do this with a bunch of for loop but using list comprehension is more straightforward. Using only this :
oi_column = [
float(t[21].text.strip().replace('-','0').replace(',',''))
for t in (t.find_all('td') for t in tables.find_all('tr'))
if len(t) > 20
]
to iterate all <td>
in all <tr>
of your table, selecting only those rows with more than 20 items (to exclude the last row) and perform text replacement or anything you want to match your requirement, here converting the text to float
The whole code would be :
from bs4 import BeautifulSoup
import requests
url = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
tables = soup.find("table", {"id":"octable"})
oi_column = [
float(t[21].text.strip().replace('-','0').replace(',',''))
for t in (t.find_all('td') for t in tables.find_all('tr'))
if len(t) > 20
]
#column to check
print(oi_column)
print("max value : {}".format(max(oi_column)))
print("index of max value : {}".format(oi_column.index(max(oi_column))))
#the row at index
root = tables.find_all('tr')[2 + oi_column.index(max(oi_column))].find_all('td')
row_items = [
(
root[1].text.strip(),
root[2].text.strip()
#etc... select index you want to extract in the corresponding rows
)
]
print(row_items)
You can find an additional example to scrap a table like this here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.