简体   繁体   中英

How can I scrape table and find out corresponding entry for maximum number in particular column?

How can I scrape table from " https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17 "

Then find out maximum "OI" under "PUTS" and finally have corresponding entries in that row for that particular maximum OI

Reached till printing rows:

import urllib2
from urllib2 import urlopen
import bs4 as bs

url = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17'

html = urllib2.urlopen(url).read()
soup = bs.BeautifulSoup(html,'lxml')
table = soup.find('div',id='octable')
rows = table.find_all('tr')
for row in rows:
print row.text

You have to iterate all the <td> inside the <tr> . You can do this with a bunch of for loop but using list comprehension is more straightforward. Using only this :

oi_column = [
    float(t[21].text.strip().replace('-','0').replace(',',''))
    for t in (t.find_all('td') for t in tables.find_all('tr'))
    if len(t) > 20
]

to iterate all <td> in all <tr> of your table, selecting only those rows with more than 20 items (to exclude the last row) and perform text replacement or anything you want to match your requirement, here converting the text to float

The whole code would be :

from bs4 import BeautifulSoup
import requests

url = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17'

response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

tables = soup.find("table", {"id":"octable"})

oi_column = [
    float(t[21].text.strip().replace('-','0').replace(',',''))
    for t in (t.find_all('td') for t in tables.find_all('tr'))
    if len(t) > 20
]
#column to check
print(oi_column)

print("max value : {}".format(max(oi_column)))
print("index of max value : {}".format(oi_column.index(max(oi_column)))) 

#the row at index
root = tables.find_all('tr')[2 + oi_column.index(max(oi_column))].find_all('td')
row_items = [
    (
        root[1].text.strip(),
        root[2].text.strip()
        #etc... select index you want to extract in the corresponding rows
    )
]
print(row_items)

You can find an additional example to scrap a table like this here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM