使用python抓取网页并将结果显示为.html

Question

我创建了一个脚本来显示网站上所有当前的职位空缺。 这样效果很好，并且可以通过SSH垂直打印列表。

但是，我现在需要做的是将此输出保存为无序列表，并将其保存到.html页面。

我正在使用的脚本是：

from lxml import html
import requests

page = requests.get('https://www.fasthosts.co.uk/careers/current-vacancies').text

tree = html.fromstring(page.content)

Vacancies = tree.xpath('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')

print Vacancies

这会将输出打印到屏幕上。

但是我的其他脚本：

import requests
from bs4 import BeautifulSoup

url = 'https://www.fasthosts.co.uk/careers/current-vacancies'
response = requests.get(url)
html = response.content

    soup = BeautifulSoup(response.content, 'html.parser')

output = soup.find ('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')
text, link = output.text, output.get('vacancy.html')

返回此错误：

在第11行的文件“ test2.py”中
文本，链接= output.text，output.get（'vacancy.html'）AttributeError：“ NoneType”对象没有属性“ text”

我现在解决了使用以下脚本将输出保存到.html文件的问题：

from lxml import html
import requests
import urllib2

page = requests.get('https://www.fasthosts.co.uk/careers/current-vacancies')
content = html.fromstring(page.content)
Vacancies = content.xpath('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')

f = open('vacancy.html', 'w')
f.write(str(Vacancies))
f.close

Answer 1

通过使用以下脚本将输出保存到.html文件，解决了该问题：

from lxml import html
import requests
import urllib2

page = requests.get('https://www.fasthosts.co.uk/careers/current-vacancies')
content = html.fromstring(page.content)
Vacancies = content.xpath('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')

f = open('vacancy.html', 'w')
f.write(str(Vacancies))
f.close

基于OP对他们帖子的编辑（可能受@ user3080953的评论影响） 。

使用python抓取网页并将结果显示为.html

问题描述

1 个解决方案

解决方案1
0 2017-10-27 22:44:40

使用python抓取网页并将结果显示为.html

问题描述

1 个解决方案

解决方案1 0 2017-10-27 22:44:40

解决方案1
0 2017-10-27 22:44:40