简体   繁体   中英

Reading pickle performs worse in Python 3.6 than in Python 2.7

I've noticed that the pickle reading part of my code is performing slower in Python 3.6 than it did in Python 2.7. It is not really an issue, but I am curious to what is causing it/if there is any explanation for this difference. I've used the following script in both versions to illustrate the difference. Here, a pickle is imported and then saved as a dataframe containing 14804726 rows and 10 columns:

import pandas as pd
import time

timestart=time.time()
picklefile=r'C:\Users\Me\rawdata.pkl'  
rawdata = pd.read_pickle(picklefile)

print(time.time()-timestart)

Which gave the following time outputs:

>>>Output 2.7.14: 14.9129998684
>>>Output 3.6.4: 60.39831018447876

When you read a Python 2 pickle in Python 3 it has to perform conversion on strings

In Python 2, strings were represented as a simple stream of bytes; in Python 3 uses a unicode representation instead that is capable of representing a much broader range of characters. When you load a Python 2 pickle in Python 3, it must convert one format to the other.

It is most likely this that is slowing your load down.

Also note that the conversion performed by default may not be the correct one for your usage and you may wish to add additional parameters to your load to ensure the correct encoding is applied.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM