Python将编码字符串转换为utf8？

Question

我有一个无法确定编码类型的文本文件。 然后使用 python open 函数打开文本文件并监视任何新内容。

这是程序：

import time
import os
import sys
from threading import Timer
from base_logger import logger
from service_check import service_info, machine_name

def follow(thefile):
    '''generator function that yields new lines in a file
    '''
    # seek the end of the file
    thefile.seek(0, os.SEEK_END)
    
    # start infinite loop
    while True:
        # read last line of file
        line = thefile.readline()
        # sleep if file hasn't been updated
        if not line:
            time.sleep(0.1)
            continue

        yield line

def detectLicenseStopping(line):
    stopStr = "Stopping license"

    if stopStr in line:
        logger.info("Detected: " +  str(line) + " " + str(type(line)))


def main(argv):
    logfile = open(argv[0], mode="r")
    loglines = follow(logfile)
    # iterate over the generator
    for line in loglines:
        detectLicenseStopping(line)

if __name__ == '__main__':
    main(sys.argv[1:])

变量 line 是从文本文件中读取的一行。 我试图检测 stopStr 变量是否在此行变量内。 但是，检测将失败，因为 line 和 stopStr 看起来明显不同:(

这是调试期间该行的值：

'\x002\x005\x00/\x001\x000\x00/\x002\x001\x00 \x002\x002\x00:\x000\x000\x00:\x005\x006\x00 \x00(\x001\x000\x004\x004\x00)\x00 \x00S\x00t\x00o\x00p\x00p\x00i\x00n\x00g\x00 \x00l\x00i\x00c\x00e\x00n\x00s\x00e\x00 \x00F\x00C\x00T\x00C\x001\x002\x000\x001\x000\x000\x000\x004\x002\x007\x000\x004\x005\x005\x007\x009\x003\x009\x005\x007\x00 \x00o\x00n\x00 \x00W\x001\x002\x006\x00P\x00A\x00A\x00P\x001\x00.\x00'

我应该如何将行转换为其适当的字符串 (utf-8) 表示？ 我尝试使用 str 函数，但它似乎没有帮助。

Answer 1

您可以使用chardet库来检测编码。

line_encoding = chardet.detect(line)["encoding"]
utf8_string = line.decode(line_encoding)

Python将编码字符串转换为utf8？

问题描述

1 个解决方案

解决方案1
0 2021-10-27 06:55:39

Python将编码字符串转换为utf8？

问题描述

1 个解决方案

解决方案1 0 2021-10-27 06:55:39

解决方案1
0 2021-10-27 06:55:39