简体繁体 English

需要从脚本标签HTML Python中提取所有链接

[英]Need to extract all links from script tag HTML Python

原文 2019-05-28 12:41:12 7 2 python/ html/ parsing

Basically i need to parse all src="" links from all <script> tags in HTML. 基本上，我需要解析HTML中所有<script>标记中的所有src =“”链接。

<script src="path/to/example.js" type="text/javascript"></script>

Unfortunately, bs4 cannot do that. 不幸的是，bs4无法做到这一点。 Any ideas how can i achieve this? 任何想法我怎么能做到这一点？

2 个解决方案

import requests
import bs4
text = requests.get('http://example.com').text
soup = bs4.BeautifulSoup(text, features='html.parser')
scripts = soup.find_all('script')
srcs = [link['src'] for link in scripts if 'src' in link.attrs]
print(srcs)

I would condense and use script[src] to ensure script has src attribute 我会压缩并使用script[src]以确保脚本具有src属性

import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://example.com').content
soup = bs(r, 'lxml') # 'html.parser' if lxml not installed
srcs = [item['src'] for item in soup.select('script[src]')]

需要从包含多个 HTML 的 CSV 中提取所有链接 - 每个 HTML 中的所有链接 - Need to extract all links from a CSV that contains several HTMLs -all links from each HTML

在 Python 中使用 BeautifulSoup 从 HTML Script 标签中提取 JSON - Extract JSON from HTML Script tag with BeautifulSoup in Python

使用 Python 中的 BeautifulSoup 从脚本标签 [HTML] 中提取数据 - Extract data from script tag [HTML] using BeautifulSoup in Python

如何从HTML中提取链接（使用python） - How to extract links from HTML (with python)

从Python页面中提取所有链接 - Extract all links from a page in Python

如何通过编写 python 脚本从许多不同的 html 链接中提取 Email、电话、传真号码和地址？ - How to extract Email, Telephone, Fax number and Address from many different html links by writing a python script?

我需要使用 python 从脚本标签内的网页中提取数据 - I need to extract data from a webpage which is within a script tag using python

Python：需要使用正则表达式从 html 页面提取标签内容，但不是 BeautifulSoup - Python: Need to extract tag content from html page using regex, but not BeautifulSoup

使用Python将html从中提取到特定标签 - Extract html from to a specific tag with Python

使用Python提取HTML链接 - Extract HTML Links using Python

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 需要从包含多个 HTML 的 CSV 中提取所有链接 - 每个 HTML 中的所有链接 - Need to extract all links from a CSV that contains several HTMLs -all links from each HTML 在 Python 中使用 BeautifulSoup 从 HTML Script 标签中提取 JSON - Extract JSON from HTML Script tag with BeautifulSoup in Python 使用 Python 中的 BeautifulSoup 从脚本标签 [HTML] 中提取数据 - Extract data from script tag [HTML] using BeautifulSoup in Python 如何从HTML中提取链接（使用python） - How to extract links from HTML (with python) 从Python页面中提取所有链接 - Extract all links from a page in Python 如何通过编写 python 脚本从许多不同的 html 链接中提取 Email、电话、传真号码和地址？ - How to extract Email, Telephone, Fax number and Address from many different html links by writing a python script? 我需要使用 python 从脚本标签内的网页中提取数据 - I need to extract data from a webpage which is within a script tag using python Python：需要使用正则表达式从 html 页面提取标签内容，但不是 BeautifulSoup - Python: Need to extract tag content from html page using regex, but not BeautifulSoup 使用Python将html从中提取到特定标签 - Extract html from to a specific tag with Python 使用Python提取HTML链接 - Extract HTML Links using Python

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM