简体   繁体   English

阅读pickle在Python 3.6中比在Python 2.7中表现更差

[英]Reading pickle performs worse in Python 3.6 than in Python 2.7

I've noticed that the pickle reading part of my code is performing slower in Python 3.6 than it did in Python 2.7. 我注意到我的代码中的pickle读取部分在Python 3.6中的执行速度比在Python 2.7中慢。 It is not really an issue, but I am curious to what is causing it/if there is any explanation for this difference. 这不是一个真正的问题,但我很好奇是什么导致它/如果有任何解释这种差异。 I've used the following script in both versions to illustrate the difference. 我在两个版本中都使用了以下脚本来说明差异。 Here, a pickle is imported and then saved as a dataframe containing 14804726 rows and 10 columns: 在这里,导入一个pickle,然后保存为包含14804726行和10列的数据帧:

import pandas as pd
import time

timestart=time.time()
picklefile=r'C:\Users\Me\rawdata.pkl'  
rawdata = pd.read_pickle(picklefile)

print(time.time()-timestart)

Which gave the following time outputs: 其中给出了以下时间输出:

>>>Output 2.7.14: 14.9129998684
>>>Output 3.6.4: 60.39831018447876

When you read a Python 2 pickle in Python 3 it has to perform conversion on strings 当你在Python 3中阅读Python 2 pickle时,它必须对字符串执行转换

In Python 2, strings were represented as a simple stream of bytes; 在Python 2中,字符串表示为简单的字节流; in Python 3 uses a unicode representation instead that is capable of representing a much broader range of characters. 在Python 3中使用的是unicode表示,它能够表示更广泛的字符。 When you load a Python 2 pickle in Python 3, it must convert one format to the other. 当您在Python 3中加载Python 2 pickle时,它必须将一种格式转换为另一种格式。

It is most likely this that is slowing your load down. 这很可能会减慢你的负荷。

Also note that the conversion performed by default may not be the correct one for your usage and you may wish to add additional parameters to your load to ensure the correct encoding is applied. 另请注意,默认情况下执行的转换可能不适合您的使用,您可能希望向负载添加其他参数以确保应用正确的编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM