简体   繁体   中英

parse page with beautifulsoup

I'm trying to parse this webpage and take some of information:

http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=778253364357513

import requests
page = requests.get("http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=778253364357513")

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

All_Information = soup.find(id="MainContent")

print(All_Information)

it seams all information between tag is hidden. when i run the code this data is returned.

<div class="tabcontent content" id="MainContent">
<div id="TopBox"></div>
<div id="ThemePlace" style="text-align:center">
<div class="box1 olive tbl z2_4 h250" id="Section_relco" style="display:none"></div>
<div class="box1 silver tbl z2_4 h250" id="Section_history" style="display:none"></div>
<div class="box1 silver tbl z2_4 h250" id="Section_tcsconfirmedorders" style="display:none"></div>
</div>
</div>

Why is the information not there, and how can I find and/or access it?

The information that I assume you are looking for is not loaded in your request. The webpage makes additional requests after it has initally loaded. There are a few ways you can get that information.

You can try selenium . It is a python package that simulates a web browser. This allows the page to load all the information before you try to scrape .

Another way is to reverse enginneer the website and find out where it is getting the information you need.

Have a look at this link. http://www.tsetmc.com/tsev2/data/instinfofast.aspx?i=778253364357513&c=57+

It is called by your page every few seconds, and it appears to contain all the pricing information you are looking for. It may be easier to call that webpage to get your information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM