简体   繁体   English

如何摆脱python中的ascii编码错误

[英]How to get rid of ascii encoding error in python

string = "Deepika Padukone, Esha Gupta or Yami Gautam - Who's looks hotter and sexier? Vote! - It's ... Deepika Padukone, Esha Gupta or Yami Gautam…. Deepika Padukone, Esha Gupta or Yami Gautam ... Tag: Deepika Padukone, Esha Gupta, Kalki Koechlin, Rang De Basanti, Soha Ali Khan, Yami  ... Amitabh Bachchan and Deepika Padukone to be seen in Shoojit Sircar's Piku ..."

fp = open("test.txt", "w+");

fp.write("%s" %string);

after running the above code I have got the following error. 运行上面的代码后,我得到以下错误。

File "encode_error.py", line 1

SyntaxError: Non-ASCII character '\xe2' in file encode_error.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

You have a U+2026 HORIZONTAL ELLIPSIS character in your string definition: 您的字符串定义中有一个U + 2026 HORIZONTAL ELLIPSIS字符:

... Deepika Padukone, Esha Gupta or Yami Gautam…. ...
                                               ^

Python requires that you declare the source code encoding if you are to use any non-ASCII characters in your source. 如果要在源中使用任何非ASCII字符,Python要求您声明源代码编码。

Your options are to: 您的选择是:

  • Declare the encoding, as specified in the linked PEP 263 . 声明编码,如链接的PEP 263中指定的那样 It's is a comment that must be the first or second line of your source file. 它是一个注释,必须是源文件的第一行或第二行。

    What you set it to depends on your code editor . 您设置的内容取决于您的代码编辑器 If you are saving files encoded as UTF-8, then the comment looks something like: 如果要保存编码为UTF-8的文件,则注释类似于:

     # coding: utf-8 

    but the format is flexible. 但格式灵活。 You can spell it encoding too, for example, and use = instead of : . 例如,您也可以拼写encoding ,并使用=而不是:

  • Replace the horizontal ellipsis with three dots, as used in the rest of the string 用三个点替换水平省略号,如字符串的其余部分所使用的那样

  • Replace the codepoint with \\xhh escape sequences to represent encoded data. \\xhh转义序列替换代码点以表示编码数据。 U+2026 encoded to UTF-8 is \\xe2\\x80\\xa6 . 编码为UTF-8的U + 2026是\\xe2\\x80\\xa6

add # coding: utf-8 to the top of your file. # coding: utf-8添加到文件的顶部。

# coding: utf-8
string = "Deepika Padukone, Esha Gupta or Yami Gautam - Who's looks hotter and sexier? Vote! - It's ... Deepika Padukone, Esha Gupta or Yami Gautam…. Deepika Padukone$

fp = open("test.txt", "w+");

fp.write("%s" %string);

Explanation: 说明:

The error is caused by the replacing standard characters like apostrophe (') by non-standard characters like quotation mark (`) during copying. 该错误是由复制期间非标准字符(如引号(`))替换撇号(')等标准字符引起的。 It happens quite often when you copy text from a pdf file. 当您从pdf文件复制文本时,它经常发生。 The difference is very subtle, but there is a huge difference as far as Python is concerned. 差异非常微妙,但就Python而言,存在巨大差异。 The apostrophe is completely legal to indicate a text string, but the quotation mark is not. 撇号对于指示文本字符串是完全合法的,但引号不是。

Technically, it's not exactly illegal to use any kind of characters we want. 从技术上讲,使用我们想要的任何字符并不完全违法。 It's just that we have to tell Python what kind of encoding we are using so that it knows what to do with these non-standard characters. 只是我们必须告诉Python我们正在使用哪种编码,以便它知道如何处理这些非标准字符。 Adding # coding: utf-8 to the top of that file will tell python that your encoding is utf-8. # coding: utf-8添加到该文件的顶部将告诉python您的编码是utf-8。

UTF-8 is an encoding format to represent the characters in the Unicode set. UTF-8是一种表示Unicode集中字符的编码格式。 It is used very widely on the web. 它在网上被广泛使用。 Unicode is the industry standard for representing and handling text on many different platforms including the web, enterprise software, printing etc. UTF-8 is one of the more popular ways used for encoding this character set. Unicode是在许多不同平台上表示和处理文本的行业标准,包括Web,企业软件,打印等.UTF-8是用于编码此字符集的更流行的方式之一。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM