简体   繁体   中英

How can I get the link from this table content (I guess it's javascript) ? (Without selenium)

I'm trying to get the href from these table contents, but in the html code is not available. [edited @ 3:44 pm 10/02/2019] I will scrape this site and others similar to this one, on a daily basis and compare with the "yesterday" data. So I get the daily new info in this data. [/edited]

I found a similar (but simpler) solution, but it uses chromedriver ( link ). I'm looking for a solution that doesn't uses Selenium.

Site: http://web.cvm.gov.br/app/esforcosrestritos/#/detalharOferta?ano=MjAxOQ%3D%3D&valor=MTE%3D&comunicado=MQ%3D%3D&situacao=Mg%3D%3D

If you click in the first parte of the table (as below) 在此处输入图像描述

You will get to this site: http://web.cvm.gov.br/app/esforcosrestritos/#/enviarFormularioEncerramento?type=dmlldw%3D%3D&ofertaId=ODc2MA%3D%3D&state=eyJhbm8iOiJNakF4T1E9PSIsInZhbG9yIjoiTVRFPSIsImNvbXVuaWNhZG8iOiJNUT09Iiwic2l0dWFjYW8iOiJNZz09In0%3D

How can I scrape the first site to get all the links it have in the tables? (to go for the second "links")

When I use requests.get it doesn't even get the content of the table. Any help?

link_cvm = "http://web.cvm.gov.br/app/esforcosrestritos/#/detalharOferta?ano=MjAxOQ%3D%3D&valor=MTE%3D&comunicado=MQ%3D%3D&situacao=Mg%3D%3D"
import requests
html_code = requests.get(link_cvm)
html_code.text
print(html_code)

The second page your are taken to is dynamically loaded using jscript. The data you are looking for is contained in another page, in json format. Search around, there is a lot of information about this, for one, of many, example, see this .

In your case, you can get to it this way:

import requests
import json

url = 'http://web.cvm.gov.br/app/esforcosrestritos/enviarFormularioEncerramento/getOfertaPorId/8760'
resp = requests.get(url)

data = json.loads(resp.content)
print(data)

The output is the information on that page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM