如何在 UTF-8 文件開頭去除垃圾字符

Question

我在 Python 3.9 中有以下代碼並且它可以工作，除了我在我的 UTF-8 編碼文本文件的開頭得到一個垃圾字符，這使它錯誤地讀取了第一行的第一個字符。 如何刪除我正在閱讀的 UTF-8 文件開頭的任何垃圾字符？

這是代碼：

actions = {'#': 'comment', 'A': 'action', 'T': 'text for polly', 'F': 'filename'}
action = "#"
poly_text_received=False
script_line = "none"
line_cnt = 0

with open(input("Enter the script filename: "),'r') as script_file:
    for line in script_file:
        line_cnt = line_cnt + 1
        line = line.strip()
        action = actions.get(line[0])
        if action == 'comment':  #Action is a comment
            line = line[1:].lstrip(':')
            print(f'Ignoring comment:  \n'
                  f'     {line}')

這是輸入文件的示例 - 代碼有更多內容，它始終查看行的第一個字符，並根據該字符執行特定操作：

#Preceed each comment with "#"
#
A:Start of video (show design with component explorer open)
T:Once you identify sets of identical components, you can create your physical reuse source circuit.
F:Start.mp3
#
A: Circle the IO_Port Groups in Component Explorer
T:This design shows four groups of identical components.
F: Circle_IO_Port_Groups.mp3
#

Answer 1

當您查看open()函數的 Python 文檔時，您會看到它有一個用於文件編碼的附加參數，當以文本模式打開文件時，該參數變得相關。

https://docs.python.org/3/library/functions.html#open

使用這個附加參數，您可以將編碼類型定義為“utf-8”或“utf8-sig”，您應該能夠很好地閱讀文本，甚至看不到垃圾字符。

如何在 UTF-8 文件開頭去除垃圾字符

問題描述

1 個解決方案

解決方案1
0 2021-10-18 17:48:19

如何在 UTF-8 文件開頭去除垃圾字符

問題描述

1 個解決方案

解決方案1 0 2021-10-18 17:48:19

解決方案1
0 2021-10-18 17:48:19