如何使用 Python 从文本文件中删除非 ASCII 字符并将文件转换为字符串

Question

import re
data2 = ''
file = open('twitter.txt', 'r')
for i in file:
    thing = re.sub(r'[^\x00-\x7f]',r'', str(file[i]))
    print(str(thing))

Hi, I'm very new to Python.嗨，我对 Python 很陌生。 After scraping a bunch of data from Twitter using Python, I put the data into a text file.在使用 Python 从 Twitter 抓取一堆数据后，我将数据放入一个文本文件中。 The text file ends up with a lot of emojis and other non-ASCII characters that can't be turned into a String.文本文件最终包含许多无法转换为字符串的表情符号和其他非 ASCII 字符。 The above code is my attempt to remove the non-ASCII characters and turn the file into a String, but it ends up giving me the error:上面的代码是我尝试删除非 ASCII 字符并将文件转换为字符串，但它最终给了我错误：

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1607: character maps to <undefined>

How can I remove the non-ASCII characters then turn the remaining text into a String?如何删除非 ASCII 字符然后将剩余的文本转换为字符串？

Answer 1

~Python 3.6 ~Python 3.6

def return_only_ascii(str)
    return ''.join([x for x in str if ord(x) < 128])

Python 3.7~蟒蛇 3.7~

def return_only_ascii(str)
    return ''.join([x for x in str if x.isascii()])

Result结果

>>> return_only_ascii('José')
'Jos'

如何使用 Python 从文本文件中删除非 ASCII 字符并将文件转换为字符串

问题描述

1 个解决方案

解决方案1
1

~Python 3.6 ~Python 3.6

Python 3.7~蟒蛇 3.7~

Result结果

如何使用 Python 从文本文件中删除非 ASCII 字符并将文件转换为字符串

问题描述

1 个解决方案

解决方案1 1

~Python 3.6 ~Python 3.6

Python 3.7~蟒蛇 3.7~

Result结果

解决方案1
1