[英]Python: How to deal with right single quotation mark when reading txt
我正在使用 Python 來讀取包含正確單引號的 txt:'。
ord("’")
Out[46]: 8217
http://www.fileformat.info/info/unicode/char/2019/index.html我正在使用以下代碼讀取 txt 文件:
with open(text_path, 'r', encoding='utf-8') as f:
transcript = f.read()
您可以編寫自定義編碼 function 將 utf-8 字符轉換為查找表中指定的 ascii 字符。
# -*- coding: utf-8 -*-
import io
def encode_file(filepath, conversion_table={}):
''' replaces utf-8 chars with specified equivalent ascii char'''
with io.open(text_path, "r", encoding="utf-8") as f:
transcript = f.read()
new_transcript = ""
for i in transcript:
new_char = ""
# append character if ascii
try:
new_transcript += i.encode("ascii")
except UnicodeEncodeError:
found_char = False
for c in conversion_table:
# replace utf-8 with custom ascii equivalent
if i == unicode(c, encoding="utf-8"):
new_transcript += conversion_table[c]
found_char = True
# no conversion found
if found_char == False:
new_transcript += "?"
return new_transcript
text_path = "/path/to/file.txt"
conversion_table = {'ü':'u', 'ô':'o', 'é':'e', 'į':'i'}
print (encode_file(text_path, conversion_table))
例如,對於包含內容的文件, my ünicôdé strįng
產生my unicode string
。
因此,您可以將''':'\''
(或任何轉換)添加到conversion_table
中,它將為您進行替換。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.