简体   繁体   English

需要从脚本标签HTML Python中提取所有链接

[英]Need to extract all links from script tag HTML Python

Basically i need to parse all src="" links from all <script> tags in HTML. 基本上,我需要解析HTML中所有<script>标记中的所有src =“”链接。

<script src="path/to/example.js" type="text/javascript"></script>

Unfortunately, bs4 cannot do that. 不幸的是,bs4无法做到这一点。 Any ideas how can i achieve this? 任何想法我怎么能做到这一点?

import requests
import bs4
text = requests.get('http://example.com').text
soup = bs4.BeautifulSoup(text, features='html.parser')
scripts = soup.find_all('script')
srcs = [link['src'] for link in scripts if 'src' in link.attrs]
print(srcs)

I would condense and use script[src] to ensure script has src attribute 我会压缩并使用script[src]以确保脚本具有src属性

import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://example.com').content
soup = bs(r, 'lxml') # 'html.parser' if lxml not installed
srcs = [item['src'] for item in soup.select('script[src]')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 需要从包含多个 HTML 的 CSV 中提取所有链接 - 每个 HTML 中的所有链接 - Need to extract all links from a CSV that contains several HTMLs -all links from each HTML 在 Python 中使用 BeautifulSoup 从 HTML Script 标签中提取 JSON - Extract JSON from HTML Script tag with BeautifulSoup in Python 使用 Python 中的 BeautifulSoup 从脚本标签 [HTML] 中提取数据 - Extract data from script tag [HTML] using BeautifulSoup in Python 如何从HTML中提取链接(使用python) - How to extract links from HTML (with python) 从Python页面中提取所有链接 - Extract all links from a page in Python 如何通过编写 python 脚本从许多不同的 html 链接中提取 Email、电话、传真号码和地址? - How to extract Email, Telephone, Fax number and Address from many different html links by writing a python script? 我需要使用 python 从脚本标签内的网页中提取数据 - I need to extract data from a webpage which is within a script tag using python Python:需要使用正则表达式从 html 页面提取标签内容,但不是 BeautifulSoup - Python: Need to extract tag content from html page using regex, but not BeautifulSoup 使用Python将html从中提取到特定标签 - Extract html from to a specific tag with Python 使用Python提取HTML链接 - Extract HTML Links using Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM