I'm trying to get the tables from a webpage using bs4
and getting them to csv using pandas
.
The webpage has two tables, I can get the first table, but only the header of second table gets scraped.
Below is the code I've used tried out.
from urllib2 import Request, urlopen
from bs4 import BeautifulSoup
from scrapelib import table_to_2d
import pandas as pd
ehurl = 'https://www.fpi.nsdl.co.in/web/Reports/Latest.aspx'
hd = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1;WOW64;rv:46.0) Gecko/46.0 Firefox/46.0'}
raq = Request(ehurl, headers=hd)
resp = urlopen(raq)
eh_page = resp.read()
soup = BeautifulSoup(eh_page, "html.parser")
i=1
for qeros in soup.findAll("table"):
x = table_to_2d(qeros)
df = pd.DataFrame(x)
df.to_csv("fpi" + str(i) + ".csv", sep=",", header=False, index=False)
i += 1
The function table_to_2d
is taken from https://stackoverflow.com/a/48451104/2724299
I'm unsure of the format your want your csv files to be in, but you can try something like this to get your tables into csv files:
from bs4 import BeautifulSoup
from requests import get
from csv import writer
url = 'https://www.fpi.nsdl.co.in/web/Reports/Latest.aspx'
r = get(url)
soup = BeautifulSoup(r.text, 'lxml')
# get all tables
tables = soup.find_all('table')
# loop over each table
for num, table in enumerate(tables, start=1):
# create filename
filename = 'table-%d.csv' % num
# open file for writing
with open(filename, 'w') as f:
# store rows here
data = []
# create csv writer object
csv_writer = writer(f)
# go through each row
rows = table.find_all('tr')
for row in rows:
# write headers if any
headers = row.find_all('th')
if headers:
csv_writer.writerow([header.text.strip() for header in headers])
# write column items
columns = row.find_all('td')
csv_writer.writerow([column.text.strip() for column in columns])
Which gives the following table-1.csv :
Daily Trends in FPI Investments on 07-Aug-2018
Reporting Date,Debt/Equity/Hybrid,Investment Route,Gross Purchases(Rs. Crore),Gross Sales (Rs. Crore),Net Investment (Rs. Crore),Net Investment US($) million,Conversion (1 USD TO INR)*
07-Aug-2018,Equity,Stock Exchange,4405.92,3972.93,432.99,63.04,Rs.68.6833
Primary market & others,14.43,0.00,14.43,2.10
Sub-total,4420.35,3972.93,447.42,65.14
Debt,Stock Exchange,465.68,116.77,348.91,50.80
Primary market & others,0.00,3.08,(3.08),(0.45)
Sub-total,465.68,119.85,345.83,50.35
Hybrid,Stock Exchange,1.33,3.93,(2.60),(0.38)
Primary market & others,0.00,0.00,0.00,0.00
Sub-total,1.33,3.93,(2.60),(0.38)
Total,4887.36,4096.71,790.65,115.11
The data presented above is compiled on the basis of reports submitted to depositories by DDPs on 07-Aug-2018 and constitutes trades conducted by FPIs/FIIs on and upto the previous trading day(s).Note
and table-2.csv :
Daily Trends in FPI Derivative Trades on 07-Aug-2018
Reporting Date,Derivative Products,Buy,Sell
Open Interest at theend of the date
No. of Contracts,Amount in Crore,No. of Contracts,Amount in Crore,No. of Contracts,Amount in Crore
07-Aug-2018,Index Futures,16899.00,1560.45,17802.00,1706.72,298303.00,26117.55
Index Options,505226.00,51512.43,526331.00,53460.93,654904.00,58508.63
Stock Futures,165411.00,11454.08,158928.00,11105.55,1108615.00,82830.85
Stock Options,84583.00,6297.87,86777.00,6441.33,108437.00,8272.44
Interest Rate Futures,0.00,0.00,0.00,0.00,2530.00,47.60
The above report is compiled on the basis of reports submitted to depositories by NSE and BSE on 07-Aug-2018 and constitutes FPIs/FIIs trading / position of the previous trading day.
It appears that for the second table, the actual tr
, th
, and td
elements are not structured under the table
tag. Therefore, scraping all tr
, th
, and td
tags will yield the desired data, and by applying itertools.groupby
, the original table structures can be obtained.
import requests, itertools
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.fpi.nsdl.co.in/web/Reports/Latest.aspx').text, 'html.parser')
table_data = [[j.text for j in (lambda x:i.find_all('td') if not x else x)(i.find_all('th'))] for i in d.find_all('tr')]
final_table = [list(b) for _, b in itertools.groupby(table_data, key=lambda x:x[0].startswith('Daily Trends'))]
table1, table2 = [final_table[i]+final_table[i+1] for i in range(0, len(final_table), 2)]
Output:
table
:
[['Daily Trends in FPI Investments on 08-Aug-2018'], ['Reporting Date', 'Debt/Equity/Hybrid', 'Investment Route', 'Gross Purchases(Rs. Crore)', 'Gross Sales (Rs. Crore)', 'Net Investment (Rs. Crore)', 'Net Investment US($) million', 'Conversion (1 USD TO INR)*'], ['08-Aug-2018', 'Equity', 'Stock Exchange', '3463.67', '3343.93', '119.74', '17.40', ' Rs.68.8000'], ['Primary market & others', '0.00', '7.23', '(7.23)', '(1.05)'], ['Sub-total', '3463.67', '3351.16', '112.51', '16.35'], ['Debt', 'Stock Exchange', '1213.42', '450.23', '763.19', '110.93'], ['Primary market & others', '40.77', '62.95', '(22.18)', '(3.22)'], ['Sub-total', '1254.19', '513.18', '741.01', '107.71'], ['Hybrid', 'Stock Exchange', '3.99', '6.96', '(2.97)', '(0.43)'], ['Primary market & others', '0.00', '0.00', '0.00', '0.00'], ['Sub-total', '3.99', '6.96', '(2.97)', '(0.43)'], ['Total', '4721.85', '3871.30', '850.55', '123.63'], ['The data presented above is compiled on the basis of reports submitted to depositories by DDPs on 08-Aug-2018 and constitutes trades conducted by FPIs/FIIs on and upto the previous trading day(s).Note']]
table2
:
[['Daily Trends in FPI Derivative Trades on 08-Aug-2018'], ['Reporting Date', 'Derivative Products', 'Buy', 'Sell', 'Open Interest at the'], ['Open Interest at the'], ['No. of Contracts', 'Amount in Crore', 'No. of Contracts', 'Amount in Crore', 'No. of Contracts', 'Amount in Crore'], ['08-Aug-2018', 'Index Futures', '18797.00', '1732.24', '16696.00', '1600.94', '303684.00', '26636.51'], ['Index Options', '495820.00', '50403.69', '512765.00', '52075.29', '673371.00', '60394.18'], ['Stock Futures', '176472.00', '11999.53', '178301.00', '12020.70', '1116162.00', '83275.79'], ['Stock Options', '98471.00', '6949.88', '101906.00', '7204.18', '116286.00', '8824.33'], ['Interest Rate Futures', '0.00', '0.00', '0.00', '0.00', '2530.00', '47.57'], ['The above report is compiled on the basis of reports submitted to depositories by NSE and BSE on 08-Aug-2018 and constitutes FPIs/FIIs trading / position of the previous trading day.']]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.