简体   繁体   中英

Compressing a pkl file

My requirement is to convert a pkl file to base64 string format so that i can return a json file containing this string along with some other contents.

{                                                                       
    'pkl_file': 'pkl_as_base64_string'                                             
    'content1': 'content1_as_base64_string'
    'content2': 'content2_as_base64_string'                                 
                     .                                                         
                     .                                                   
}

Now i have tried out this code with https://stackoverflow.com/a/26349372/9316658 as the reference

with open(DIR_PATH + 'd885d7a4bbb742cbb397c2642339e950.pkl', 'rb') as f:
    data = pickle.load(f)
    serialized_str = base64.b64encode(pickle.dumps(data))
    print serialized_str

I am getting this when i execute the above code

Traceback (most recent call last):
File "/home/bhargav/PycharmProjects/Test/export_import.py", line 8, in <module>
    data = pickle.load(f)
ImportError: No module named ml.model.project_model

When i open the pkl file using a text editor, these are the first few lines

(iml.model.project_model
ProjectModel
p0
(dp1
S'project_predict_pipe'
p2
(iml.pipeline.base
ICVPipeline
p3
(dp4
S'processors'
p5
(lp6
(iml.pi.file.pdf_to_img_pi
PdfFileConvertPI
p7
(dp8
S'process'
p9
Nsba(iml.pi.ocr.file_ocr_pi

I am not sure why python is interpreting the text inside the pkl files as python commands ( I am new to python programming and never dealt with pkl files before ). Also, the pkl file is huge in size (1.2 GB). How do i achieve pkl to bas64 conversion in the most effective way possible? Any help is appreciated. TIA

The problem is probably related to the fact that the pkl uses a type/ class that is not known in your environment. If you wrote this file, just import/ declare the missing type (probably ml.model.project_model ).

Anyway- what you were trying to do is to translate the object in the pkl to base 64, rather than the file itself as you said (meaning- not using the pkl itself). For example, if the pkl contains a dictionary d , you were trying to have a base64 of d . But- the b64encode should receive a string or buffer, so it won't work.

So- I think what you really want to do is to dump d to a pkl file (this is the file you already have), and translate the file's content to base64. For this, you don't need to use dump , just do-

with open(DIR_PATH + 'd885d7a4bbb742cbb397c2642339e950.pkl', 'rb') as f:
    serialized_str = base64.b64encode(f.read())
    print serialized_str

Then, the other side will need to open the base64 (using b64decode ), write it to a file, and then open this file with pickle.load() to get the original object (in my example- d ). This will work assuming he has the ml.model.project_model module declared.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM