简体   繁体   中英

Scrape all tables from a webpage using python bs4

I'm trying to get the tables from a webpage using bs4 and getting them to csv using pandas .

The webpage has two tables, I can get the first table, but only the header of second table gets scraped.

Below is the code I've used tried out.

from urllib2 import Request, urlopen
from bs4 import BeautifulSoup
from scrapelib import table_to_2d
import pandas as pd

ehurl = 'https://www.fpi.nsdl.co.in/web/Reports/Latest.aspx'
hd = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1;WOW64;rv:46.0) Gecko/46.0 Firefox/46.0'}

raq = Request(ehurl, headers=hd)
resp = urlopen(raq)
eh_page = resp.read()

soup = BeautifulSoup(eh_page, "html.parser")

i=1
for qeros in soup.findAll("table"):
    x = table_to_2d(qeros)
    df = pd.DataFrame(x)
    df.to_csv("fpi" + str(i) + ".csv", sep=",", header=False, index=False)
    i += 1

The function table_to_2d is taken from https://stackoverflow.com/a/48451104/2724299

I'm unsure of the format your want your csv files to be in, but you can try something like this to get your tables into csv files:

from bs4 import BeautifulSoup
from requests import get
from csv import writer

url = 'https://www.fpi.nsdl.co.in/web/Reports/Latest.aspx'

r = get(url)
soup = BeautifulSoup(r.text, 'lxml')


# get all tables
tables = soup.find_all('table')

# loop over each table
for num, table in enumerate(tables, start=1):

    # create filename
    filename = 'table-%d.csv' % num

    # open file for writing
    with open(filename, 'w') as f:

        # store rows here
        data = []

        # create csv writer object
        csv_writer = writer(f)

        # go through each row
        rows = table.find_all('tr')
        for row in rows:

            # write headers if any
            headers = row.find_all('th')
            if headers:
                csv_writer.writerow([header.text.strip() for header in headers])

            # write column items
            columns = row.find_all('td')
            csv_writer.writerow([column.text.strip() for column in columns])

Which gives the following table-1.csv :

Daily Trends in FPI Investments on 07-Aug-2018

Reporting Date,Debt/Equity/Hybrid,Investment Route,Gross Purchases(Rs. Crore),Gross Sales (Rs. Crore),Net Investment (Rs. Crore),Net Investment US($) million,Conversion (1 USD TO INR)*

07-Aug-2018,Equity,Stock Exchange,4405.92,3972.93,432.99,63.04,Rs.68.6833
Primary market & others,14.43,0.00,14.43,2.10
Sub-total,4420.35,3972.93,447.42,65.14
Debt,Stock Exchange,465.68,116.77,348.91,50.80
Primary market & others,0.00,3.08,(3.08),(0.45)
Sub-total,465.68,119.85,345.83,50.35
Hybrid,Stock Exchange,1.33,3.93,(2.60),(0.38)
Primary market & others,0.00,0.00,0.00,0.00
Sub-total,1.33,3.93,(2.60),(0.38)
Total,4887.36,4096.71,790.65,115.11
The data presented above is compiled on the basis of reports submitted to depositories by DDPs on 07-Aug-2018 and constitutes trades conducted by FPIs/FIIs on and upto the previous trading day(s).Note

and table-2.csv :

Daily Trends in FPI Derivative Trades on 07-Aug-2018

Reporting Date,Derivative Products,Buy,Sell

Open Interest at theend of the date

No. of Contracts,Amount in Crore,No. of Contracts,Amount in Crore,No. of Contracts,Amount in Crore

07-Aug-2018,Index Futures,16899.00,1560.45,17802.00,1706.72,298303.00,26117.55
Index Options,505226.00,51512.43,526331.00,53460.93,654904.00,58508.63
Stock Futures,165411.00,11454.08,158928.00,11105.55,1108615.00,82830.85
Stock Options,84583.00,6297.87,86777.00,6441.33,108437.00,8272.44
Interest Rate Futures,0.00,0.00,0.00,0.00,2530.00,47.60
The above report is compiled on the basis of reports submitted to depositories by NSE and BSE on 07-Aug-2018 and constitutes  FPIs/FIIs trading / position of the previous trading day.

It appears that for the second table, the actual tr , th , and td elements are not structured under the table tag. Therefore, scraping all tr , th , and td tags will yield the desired data, and by applying itertools.groupby , the original table structures can be obtained.

import requests, itertools
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.fpi.nsdl.co.in/web/Reports/Latest.aspx').text, 'html.parser')
table_data = [[j.text for j in (lambda x:i.find_all('td') if not x else x)(i.find_all('th'))] for i in d.find_all('tr')] 
final_table = [list(b) for _, b in itertools.groupby(table_data, key=lambda x:x[0].startswith('Daily Trends'))]
table1, table2 = [final_table[i]+final_table[i+1] for i in range(0, len(final_table), 2)]

Output:

table :

[['Daily Trends in FPI Investments on 08-Aug-2018'], ['Reporting Date', 'Debt/Equity/Hybrid', 'Investment Route', 'Gross Purchases(Rs. Crore)', 'Gross Sales (Rs. Crore)', 'Net Investment (Rs. Crore)', 'Net Investment US($) million', 'Conversion (1 USD TO INR)*'], ['08-Aug-2018', 'Equity', 'Stock Exchange', '3463.67', '3343.93', '119.74', '17.40', ' Rs.68.8000'], ['Primary market & others', '0.00', '7.23', '(7.23)', '(1.05)'], ['Sub-total', '3463.67', '3351.16', '112.51', '16.35'], ['Debt', 'Stock Exchange', '1213.42', '450.23', '763.19', '110.93'], ['Primary market & others', '40.77', '62.95', '(22.18)', '(3.22)'], ['Sub-total', '1254.19', '513.18', '741.01', '107.71'], ['Hybrid', 'Stock Exchange', '3.99', '6.96', '(2.97)', '(0.43)'], ['Primary market & others', '0.00', '0.00', '0.00', '0.00'], ['Sub-total', '3.99', '6.96', '(2.97)', '(0.43)'], ['Total', '4721.85', '3871.30', '850.55', '123.63'], ['The data presented above is compiled on the basis of reports submitted to depositories by DDPs on 08-Aug-2018 and constitutes trades conducted by FPIs/FIIs on and upto the previous trading day(s).Note']]

table2 :

[['Daily Trends in FPI Derivative Trades on 08-Aug-2018'], ['Reporting Date', 'Derivative Products', 'Buy', 'Sell', 'Open Interest at the'], ['Open Interest at the'], ['No. of Contracts', 'Amount in Crore', 'No. of Contracts', 'Amount in Crore', 'No. of Contracts', 'Amount in Crore'], ['08-Aug-2018', 'Index Futures', '18797.00', '1732.24', '16696.00', '1600.94', '303684.00', '26636.51'], ['Index Options', '495820.00', '50403.69', '512765.00', '52075.29', '673371.00', '60394.18'], ['Stock Futures', '176472.00', '11999.53', '178301.00', '12020.70', '1116162.00', '83275.79'], ['Stock Options', '98471.00', '6949.88', '101906.00', '7204.18', '116286.00', '8824.33'], ['Interest Rate Futures', '0.00', '0.00', '0.00', '0.00', '2530.00', '47.57'], ['The above report is compiled on the basis of reports submitted to depositories by NSE and BSE on 08-Aug-2018 and constitutes  FPIs/FIIs trading / position of the previous trading day.']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM