[英]Scraping subfiles from URL using Python
A webpage I would like to scrape consists of several files:我想抓取的网页包含几个文件:
I'm interested of scraping only the highlighted file, that is: mboxFrame.我有兴趣只抓取突出显示的文件,即:mboxFrame。
My method of scraping pages我的页面抓取方法
import requests
from bs4 import BeautifulSoup
webPage = requests.get(URL, verify=False)
soup = BeautifulSoup(webPage.content, "html.parser" )
is able to scrape only the file mail.html.只能抓取文件mail.html 。 Is there a way to scrape only what I want?
有没有办法只刮我想要的东西?
I would appreciate any hints or tips.我将不胜感激任何提示或提示。
The way to open a file from a server is to request it with a URL.从服务器打开文件的方法是使用 URL 请求它。 In fact, in the beginnings of the world wide web this was the only way to get content: content creators would put various files on servers and clients would open or download those files.
事实上,在世界范围内 web 的开始,这是获取内容的唯一方法:内容创建者将各种文件放在服务器上,客户端将打开或下载这些文件。 The dynamic processing of URIs and parameters is a later invention.
URI 和参数的动态处理是后来的发明。 That is why commenters are asking for the URL you use.
这就是为什么评论者要求您使用 URL。 We want to see it and modify accordingly to help you see what parts need changing in order to get that particular file.
我们希望查看它并进行相应修改,以帮助您查看需要更改哪些部分才能获取该特定文件。 You can omit the password, or replace it with some other string of letters.
您可以省略密码,或将其替换为其他字符串。
In general, the file you want would be under the url you use, but ending with the file name.通常,您想要的文件将在您使用的 url 下,但以文件名结尾。 If the startong URL is
www.example.com/mail/
, then this file would be at www.example.com/mail/mbox.msc
.如果 startong URL 是
www.example.com/mail/
,那么这个文件将在www.example.com/mail/mbox.msc
。
Please note that any parameters should follow the path, so www.example.com/mail?user=hendrra&password=hendras_password
would turn into www.example.com/mail/mbox.msc?user=hendrra&password=hendras_password
请注意,任何参数都应遵循路径,因此
www.example.com/mail?user=hendrra&password=hendras_password
会变成www.example.com/mail/mbox.msc?user=hendrra&password=hendras_password
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.