[英]Python, Remove characters, such as emoji, that cannot be handled by UTF8 MySQL DB
[英]Python Code to Remove Spaces from Chinese Characters in multiple UTF8 text files
我正在嘗試在 Python 3.7.2 中編寫 Python 代碼,以刪除同一目錄中多個 UTF8 文本文件中所有漢字的空格。
我目前擁有的代碼僅適用於 1 個文件:
import re
with open("transcript 0623.txt") as text:
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
with open("transcript 0623_out.txt", "w") as result:
result.write(new_text)
我收到以下錯誤:
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\Wave.3\test.py", line 4, in <module>
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\Lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
你能告訴我哪里出了問題並幫助我提出改進代碼的建議嗎? 謝謝你。
open()
返回一個文件 object(來源: https://docs.python.org/3/library/functions.html#open )
如果要對文件內容執行正則表達式操作,則必須對文件 object 使用.read()
function 來獲取文本內容。
例如,
with open("transcript 0623.txt") as f:
text = f.read()
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
with open("transcript 0623_out.txt", "w") as result:
result.write(new_text)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.