save html content into a txt file using python3

Question

I'm tired of searching and trying codes that give repetitive errors, I really hope someone will help me figure this out. my probleme is so simple I'm trying to save an html code in a txt file using python, here's the code I'm using:

from urllib.request import urlopen as uReq
url1 = 'http://www.marmiton.org/recettes/menu-de-la-semaine.aspx'
page = uReq(url1).read().decode()
f = open("test.html", "w")
f.write(page)
f.close()

but it's giving me the following error:

UnicodeEncodeError: 'charmap' codec can't encode character '\♥' in position 416224: character maps to

Answer 1

Here is the updated solution:

Python 2.x:

import urllib

url1 = 'http://www.marmiton.org/recettes/menu-de-la-semaine.aspx'
page = urllib.urlopen(url1).read()
f = open("./test1.html", "w")
f.write(page)
f.close()

Python 3.x:

import urllib.request
import shutil

url1 = 'http://www.marmiton.org/recettes/menu-de-la-semaine.aspx'
page = urllib.request.urlopen(url1)
print(page)
f = open("./test2.html", "wb")
shutil.copyfileobj(page, f)
f.close()

You need to use urllib to help you achieve this task.

Answer 2

You should try with requests and bs4 (BeautifulSoup)

from bs4 import BeautifulSoup
import requests
r = requests.get("https://stackoverflow.com/questions/47503845/save-html-content-into-a-txt-file-using-python")
data = r.text
soup = BeautifulSoup(data)
print(soup)
with open ('/tmp/test.html', 'a') as f:
    f.write(str(soup))

Answer 3

You mention that by not using the .decode() method gives you A Type Error. Have you try to take the HTML content and pass it to the write() method as a string. You may find the way to enclose the HTML content with triple quotes, so you pass it as a multiline string.

save html content into a txt file using python3

Question

3 answers

solution1
1 ACCPTED 2017-11-27 04:26:21

solution2
0 2017-11-27 04:34:50

solution3
0 2017-11-27 04:46:42

save html content into a txt file using python3

Question

3 answers

solution1 1 ACCPTED 2017-11-27 04:26:21

solution2 0 2017-11-27 04:34:50

solution3 0 2017-11-27 04:46:42

solution1
1 ACCPTED 2017-11-27 04:26:21

solution2
0 2017-11-27 04:34:50

solution3
0 2017-11-27 04:46:42