简体   繁体   中英

How scrape a website in which i post information

I want to scrape announcements information from the https://nseindia.com/corporates/corporateHome.html?id=allAnnouncements . Specifically i want to goto Corporate information tab on left hand side of website and then open the link of corporate announcements under equities. After that i want to post information of certain equity symbol in text box and download the output through the export csv link on left hand side of the page.

I am struggling to understand how to first navigate to this particular page since all the pages have the same url https://nseindia.com/corporates/corporateHome.html?id=allAnnouncements . I have been trying to use inspect network in chrome to know how do i navigate to particular page from the above link. After doing some research on network tab

实际网页导航的位置 .

检查网络以找出链接 .

Need to know how to request it.

I am expecting script to navigate to particular page and then post information of symbols to download announcements csv link

You found good url. It gives data in JSON format. But this JSON has some mistakes and standard module json can't read it. With module dirtyjson I can read it.

import requests
#import json
import dirtyjson

url = 'https://nseindia.com/corporates/corpInfo/equities/getAnnouncements.jsp?period=Latest%20Announced'

r = requests.get(url)
#data = r.json() # doesn't work because JSON data has some mistakes

#text = r.text.strip()
#print(text)
#data = json.loads(text) # doesn't work because JSON data has some mistakes

data = dirtyjson.loads(r.text)
#print(data)

for item in data['rows']:
    #print(item)
    print(item.keys())
    print(item['sym'])
    print(item['desc'])
    print(item['name'])
    print(item['date'][:2], item['date'][2:4], item['date'][4:8])

Some results:

['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
MOTOGENFIN
Updates
The Motor & General Finance Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
KANORICHEM
Address Change
Kanoria Chemicals & Industries Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
JAIHINDPRO
Updates
Jaihind Projects Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BLUECHIP
Appointment
Blue Chip India Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BLUECHIP
Resignation
Blue Chip India Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
JAIHINDPRO
Corporate Insolvency Resolution Process
Jaihind Projects Limited
21 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
PAEL
Updates
PAE Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BANDHANBNK
Updates
Bandhan Bank Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
ALICON
Updates
Alicon Castalloy Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
ADANIENT
Acquisition
Adani Enterprises Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
NLCINDIA
Updates
NLC India Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
SHILPAMED
Updates
Shilpa Medicare Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
KOTHARIPRO
Code of Conduct under SEBI(PIT) Reg., 2015
Kothari Products Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
LEEL
Updates
LEEL Electricals Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
SHILPAMED
Updates
Shilpa Medicare Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
ATULAUTO
Updates
Atul Auto Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
PDPL
Resignation
Parenteral Drugs (India) Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
IDBI
Updates
IDBI Bank Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BLBLIMITED
Updates
BLB Limited
20 04 2019
['sym', 'desc', 'Ind', 'ISIN', 'name', 'date', 'seqId']
BLBLIMITED
Shareholders meeting
BLB Limited
20 04 2019

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM