简体   繁体   English

写入.txt和.xlsx格式时输出文件大小的差异

[英]Difference in output file size when writing to .txt and .xlsx formats

I have tried saving 800 JSON responses to .txt file and also Excel file. 我尝试将800个JSON响应保存到.txt文件以及Excel文件中。

For txt, I am using: 对于txt,我正在使用:

for activity_id in activity_ids:
        activity_details = requests.get(url, params=activity_id).text
        with open('test.txt','a') as f:
            f.write(activity_details + '\n')

For JSON: 对于JSON:

def df_to_excel(df, filename):
    writer = pandas.ExcelWriter(filename)
    df.to_excel(writer, 'Sheet1')
    writer.save()

for activity_id in activity_ids:
    activity_details = requests.get(url, params=activity_id).json()  
    df = json_normalize(j)
    df_to_excel(df, 'test.xlsx')

Why is there a huge difference in output file size: 6.5 Mb for Excel file and 30 Mb for txt? 为什么输出文件大小有巨大差异:Excel文件为6.5 Mb,txt文件为30 Mb? If anything, I would expect Excel file to be larger. 如果有的话,我希望Excel文件更大。 Is there something I can do to shrink the txt output file? 有什么办法可以缩小txt输出文件?

Excel documents (.xlsx) are zip files containing xml files. Excel文档(.xlsx)是包含xml文件的zip文件。 The size difference that you're seeing is a result of the compression from the zip process. 您看到的大小差异是zip压缩过程的结果。

The text file is being opened in append mode. 文本文件以附加模式打开。 If you haven't cleared it somewhere at the beginning of your code, it will keep accumulating a lot of excess text on each run of code. 如果您没有在代码开始时清除它,它将在每次运行代码时不断积累大量多余的文本。 Additionally, you are writing responses for each activity id in text file, whereas it seems like you are overwriting sheet 1 in excel, so the excel only stores the last activity id info. 此外,您正在为文本文件中的每个活动ID编写响应,而您似乎正在excel中覆盖工作表1,因此excel仅存储最后一个活动ID信息。

Edit: And yes, as @Michael stated, the excel files do store compressed data, and have smaller sizes than a plain text file. 编辑:是的,正如@Michael所说,excel文件确实存储压缩数据,并且比纯文本文件小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM