简体   繁体   English

将 html 页面从网站写入 CSV 文件时出错

[英]Getting error while writing html page from website into CSV file

When I try to write html of a web page in my_html.html this error pops up.当我尝试在 my_html.html 中编写网页的 html 时,会弹出此错误。 Please guide main how I can write it successfully.请指导我如何成功编写它。

ERROR: File "C:\\Users\\DRB\\AppData\\Local\\Programs\\Python\\Python38-32\\lib\\encodings\\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\⇣' in position 84032: character maps to错误:文件“C:\\Users\\DRB\\AppData\\Local\\Programs\\Python\\Python38-32\\lib\\encodings\\cp1252.py”,第 19 行,编码返回 codecs.charmap_encode(input,self.errors,encoding_table) [0] UnicodeEncodeError: 'charmap' codec can't encode character '\⇣' in position 84032: character maps to

import requests

def url_to_file(url, fname= "web_txt.html"):
    response = requests.get(url)
    html_text = response.text
    if response.status_code == 200:
        with open(fname, "w") as r:
            r.write(str(html_text))

        return html_text

    return "Failed to perform its task."

url = "https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/"
print(url_to_file(url))

Try to open the page in binary mode and save the .content of response, not .text :尝试以二进制模式打开页面并保存响应的.content ,而不是.text

import requests

def url_to_file(url, fname="web_txt.html"):
    response = requests.get(url)
    html_content = response.content         # <-- use .content
    if response.status_code == 200:
        with open(fname, "wb") as r:        # <-- open file in binary mode
            r.write(html_content)

        return html_content.decode('utf-8', 'ignore')   # <-- decode content as utf-8

    return "Failed to perform its task."

url = "https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/"
print(url_to_file(url))

Prints:印刷:

<!DOCTYPE html>
<!--[if IE 7]>
<html class="ie ie7" lang="en-US" prefix="og: http://ogp.me/ns#">
<![endif]-->

...<!DOCTYPE html>
<!--[if IE 7]>
<html class="ie ie7" lang="en-US" prefix="og: http://ogp.me/ns#">
<![endif]-->

...

and saves web_txt.html并保存web_txt.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM