繁体   English   中英

如何使用python(BeautifulSoup)从代码中提取以下src(iframe)

[英]How to extract the following src (iframe) from the code using python (BeautifulSoup)

我试图从中提取“src”,但我没有成功。 此页面是动态的,只有在我搜索时才会出现。

网站:http ://191.253.16.180 :8080/ConsultaLei/Default.aspx?numero=3001

查看源:http ://191.253.16.180 :8080/ConsultaLei/Default.aspx?numero=3001

r = requests.get("http://191.253.16.180:8080/ConsultaLei/Default.aspx?numero=3001")
arquivo = BeautifulSoup(r.content, "html.parser")
for link in arquivo.find_all("iframe"):
    print(link)

要在此站点请求上模拟 POST,您可以使用以下示例:

import requests
from bs4 import BeautifulSoup

url = "http://191.253.16.180:8080/ConsultaLei/Default.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = {}
for inp in soup.select("input[value]"):
    data[inp["name"]] = inp["value"]

data["ctl00$MainContent$txtNumero"] = "3001"  # <-- this is your number
data["ctl00$MainContent$ddlEspecie"] = ""
data["ctl00$MainContent$ddlAno"] = ""
data["ctl00$MainContent$txtConteudo"] = ""
data["ctl00$MainContent$txtEmenta"] = ""
data["ctl00$MainContent$imgBuscar.x"] = "1"
data["ctl00$MainContent$imgBuscar.y"] = "9"

soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
print(soup.iframe["src"])

印刷:

../procuradoriacg/Leis\1994/8277_LEI30011994pag0001_strDocumentoOficial.pdf

编辑:要获得多个页面:

import requests
from bs4 import BeautifulSoup

url = "http://191.253.16.180:8080/ConsultaLei/Default.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = {}
for inp in soup.select("input[value]"):
    data[inp["name"]] = inp["value"]

data["ctl00$MainContent$ddlEspecie"] = ""
data["ctl00$MainContent$ddlAno"] = ""
data["ctl00$MainContent$txtConteudo"] = ""
data["ctl00$MainContent$txtEmenta"] = ""
data["ctl00$MainContent$imgBuscar.x"] = "1"
data["ctl00$MainContent$imgBuscar.y"] = "9"


for i in range(3000, 3010):
    data["ctl00$MainContent$txtNumero"] = i

    s = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
    if s.find("iframe"):
        print(i, s.iframe["src"])
    else:
        print(i, "Not Found")

印刷:

3000 Not Found
3001 ../procuradoriacg/Leis\1994/8277_LEI30011994pag0001_strDocumentoOficial.pdf
3002 Not Found
3003 ../procuradoriacg/Leis\1994/8279_LEI30031994pag0001_strDocumentoOficial.pdf
3004 Not Found
3005 Not Found
3006 ../procuradoriacg/Leis\1994/8282_LEI30061994pag0001_strDocumentoOficial.pdf
3007 Not Found
3008 Not Found
3009 Not Found

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM