简体   繁体   中英

How do I strip garbage characters at start of UTF-8 file

I have the following code in Python 3.9 and it works, except that I am getting a garbage character at the start of my UTF-8 encoded text file that is making it incorrectly read the first character of the first line. How do I strip any garbage characters at the beginning of the UTF-8 file that I am reading?

Here is the code:

actions = {'#': 'comment', 'A': 'action', 'T': 'text for polly', 'F': 'filename'}
action = "#"
poly_text_received=False
script_line = "none"
line_cnt = 0

with open(input("Enter the script filename: "),'r') as script_file:
    for line in script_file:
        line_cnt = line_cnt + 1
        line = line.strip()
        action = actions.get(line[0])
        if action == 'comment':  #Action is a comment
            line = line[1:].lstrip(':')
            print(f'Ignoring comment:  \n'
                  f'     {line}')

Here is a sample of the input file - there is more to the code, it always looks at the first character of the line and, based on that character, performs a specific action:

#Preceed each comment with "#"
#
A:Start of video (show design with component explorer open)
T:Once you identify sets of identical components, you can create your physical reuse source circuit.
F:Start.mp3
#
A: Circle the IO_Port Groups in Component Explorer
T:This design shows four groups of identical components.
F: Circle_IO_Port_Groups.mp3
#

When you look at the Python documentation for the open() function, you will see that it has an additional argument for the file's encoding, which becomes relevant when a file is opened in text mode.

https://docs.python.org/3/library/functions.html#open

Using this additional argument, you can define the encoding type as "utf-8" or "utf8-sig" and you should be able to read the text just fine, without even seeing the garbage characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM