Python将撇号写入文件

Question

I'm converting a downloaded Facebook Messenger conversation from JSON to a text file using Python.我正在使用 Python 将下载的 Facebook Messenger 对话从 JSON 转换为文本文件。 I've converted the JSON to text and it's all looking fine.我已将 JSON 转换为文本，一切正常。 I need to strip the unnecessary information and reverse the order of the messages, then save the output to a file, which I've done.我需要去除不必要的信息并反转消息的顺序，然后将输出保存到一个文件中，我已经完成了。 However, when I am formatting the messages with Python, when I look at the output file, sometimes instead of an apostrophe, there's â instead.但是，当我使用 Python 格式化消息时，当我查看输出文件时，有时会出现 â 而不是撇号。

My Python isn't great as I normally work with Java, so there's probably a lot of things I could improve.我的 Python 不是很好，因为我通常使用 Java，所以可能有很多我可以改进的地方。 If someone could suggest some better tags for this question, I'd also be very appreciative.如果有人可以为这个问题建议一些更好的标签，我也会非常感激。

Example of apostrophe working: You're not making them are you?撇号工作的例子：你不是在制造它们，是吗？

Example of apostrophe not working: Itâs just a button I discovered撇号不起作用的例子：它只是我发现的一个按钮

What is causing this to happen and why does not happen every time there is an apostrophe?是什么导致这种情况发生，为什么每次出现撇号时都不会发生？

Here is the script:这是脚本：

#/usr/bin/python3

import datetime

def main():

    input_file = open('messages.txt', 'r')
    output_file = open('results.txt', 'w')

    content_list = []
    sender_name_list = []
    time_list = []

    line = input_file.readline()

    while line:
        line = input_file.readline()

        if "sender_name" in line:
            values = line.split("sender_name")
            sender_name_list.append(values[1][1:])

        if "timestamp_ms" in line:
            values = line.split("timestamp_ms")
            time_value = values[1]
            timestamp = int(time_value[1:])         
            time = datetime.datetime.fromtimestamp(timestamp / 1000.0)      
            time_truncated = time.replace(microsecond=0)
            time_list.append(time_truncated)    

        if "content" in line:
            values = line.split("content")
            content_list.append(values[1][1:])

    content_list.reverse()
    sender_name_list.reverse()
    time_list.reverse()

    for x in range(1, len(content_list)):
        output_file.write(sender_name_list[x])
        output_file.write(str(time_list[x]))
        output_file.write("\n")
        output_file.write(content_list[x])
        output_file.write("\n\n")


input_file.close()
output_file.close()

if __name__ == "__main__":
    main()

Edit: The answer to the question was adding编辑：问题的答案是添加

import codecs
input_file = codecs.open('messages.txt', 'r', 'utf-8')
output_file = codecs.open('results.txt','w', 'utf-8')

Answer 1

Without seeing the incoming data it's hard to be sure, but I suspect that instead of an apostrophe ( Unicode U+0027 ' APOSTROPHE ), you've got a curly-equivalent ( U+2019 ' RIGHT SINGLE QUOTATION MARK ) in there trying to be interpreted as old-fashioned ascii.没有看到传入的数据很难确定，但我怀疑不是撇号（ Unicode U+0027 ' APOSTROPHE ），而是有一个等效的卷曲（ U+2019 ' RIGHT SINGLE QUOTATION MARK ）在那里试图被解释为老式的 ascii。

Instead of代替

output_file = open('results.txt', 'w')

try尝试

import codecs
output_file = codecs.open('results.txt','w', 'utf-8')

You may also need the equivalent on your input file.您可能还需要输入文件中的等效项。

Python将撇号写入文件

问题描述

1 个解决方案

解决方案1
1 2019-03-01 04:27:55

Python将撇号写入文件

问题描述

1 个解决方案

解决方案1 1 2019-03-01 04:27:55

解决方案1
1 2019-03-01 04:27:55