写入.txt和.xlsx格式时输出文件大小的差异

Question

I have tried saving 800 JSON responses to .txt file and also Excel file. 我尝试将800个JSON响应保存到.txt文件以及Excel文件中。

For txt, I am using: 对于txt，我正在使用：

for activity_id in activity_ids:
        activity_details = requests.get(url, params=activity_id).text
        with open('test.txt','a') as f:
            f.write(activity_details + '\n')

For JSON: 对于JSON：

def df_to_excel(df, filename):
    writer = pandas.ExcelWriter(filename)
    df.to_excel(writer, 'Sheet1')
    writer.save()

for activity_id in activity_ids:
    activity_details = requests.get(url, params=activity_id).json()  
    df = json_normalize(j)
    df_to_excel(df, 'test.xlsx')

Why is there a huge difference in output file size: 6.5 Mb for Excel file and 30 Mb for txt? 为什么输出文件大小有巨大差异：Excel文件为6.5 Mb，txt文件为30 Mb？ If anything, I would expect Excel file to be larger. 如果有的话，我希望Excel文件更大。 Is there something I can do to shrink the txt output file? 有什么办法可以缩小txt输出文件？

Answer 1

Excel documents (.xlsx) are zip files containing xml files. Excel文档（.xlsx）是包含xml文件的zip文件。 The size difference that you're seeing is a result of the compression from the zip process. 您看到的大小差异是zip压缩过程的结果。

Answer 2

The text file is being opened in append mode. 文本文件以附加模式打开。 If you haven't cleared it somewhere at the beginning of your code, it will keep accumulating a lot of excess text on each run of code. 如果您没有在代码开始时清除它，它将在每次运行代码时不断积累大量多余的文本。 Additionally, you are writing responses for each activity id in text file, whereas it seems like you are overwriting sheet 1 in excel, so the excel only stores the last activity id info. 此外，您正在为文本文件中的每个活动ID编写响应，而您似乎正在excel中覆盖工作表1，因此excel仅存储最后一个活动ID信息。

Edit: And yes, as @Michael stated, the excel files do store compressed data, and have smaller sizes than a plain text file. 编辑：是的，正如@Michael所说，excel文件确实存储压缩数据，并且比纯文本文件小。

写入.txt和.xlsx格式时输出文件大小的差异

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-11-10 20:04:27

解决方案2
0 2018-11-10 20:08:11

写入.txt和.xlsx格式时输出文件大小的差异

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-11-10 20:04:27

解决方案2 0 2018-11-10 20:08:11

解决方案1
2 已采纳 2018-11-10 20:04:27

解决方案2
0 2018-11-10 20:08:11