简体   繁体   中英

Using python to scrape a webpage and display the results to .html

I have created a script to display all present vacancies from a website. This works well and will print the list vertically via SSH.

However, what I now need to do is save this output as an unordered list and save it to a .html page.

The script I am using is:

from lxml import html
import requests

page = requests.get('https://www.fasthosts.co.uk/careers/current-vacancies').text

tree = html.fromstring(page.content)

Vacancies = tree.xpath('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')

print Vacancies

This will print the output to screen.

However my other script:

import requests
from bs4 import BeautifulSoup

url = 'https://www.fasthosts.co.uk/careers/current-vacancies'
response = requests.get(url)
html = response.content

    soup = BeautifulSoup(response.content, 'html.parser')

output = soup.find ('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')
text, link = output.text, output.get('vacancy.html')

Returns this error:

File "test2.py", line 11, in
text, link = output.text, output.get('vacancy.html') AttributeError: 'NoneType' object has no attribute 'text'

I have now resolved saving the output to a .html file using the following script:

from lxml import html
import requests
import urllib2

page = requests.get('https://www.fasthosts.co.uk/careers/current-vacancies')
content = html.fromstring(page.content)
Vacancies = content.xpath('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')

f = open('vacancy.html', 'w')
f.write(str(Vacancies))
f.close

The problem was resolved by saving the output to a .html file, using the following script:

from lxml import html
import requests
import urllib2

page = requests.get('https://www.fasthosts.co.uk/careers/current-vacancies')
content = html.fromstring(page.content)
Vacancies = content.xpath('//h1[@class="featuredvacancy__title featuredvacancy__title--invert grid-16 alpha"]/text()')

f = open('vacancy.html', 'w')
f.write(str(Vacancies))
f.close

Based on the OP 's edit to their post (and likely influenced by the comment of @user3080953 ) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM