简体   繁体   中英

Convert/Write PDF to RAM as file-like object for further working with it

My script generates PDF ( PyPDF2.pdf.PdfFileWriter object ) and stores it in the variable. I need to work with it as file-like object further in script. But now I have to write it to HDD first. Then I have to open it as file to work with it.

To prevent this unnecessary writing/reading operations I found many solutions - StringIO , BytesIO and so on. But I cannot find what exactly can help me in my case.

As far as I understand - I need to "convert" (or write to RAM) PyPDF2.pdf.PdfFileWriter object to file-like object to work directly with it.

Or there is another method that fits exactly to my case?

UPDATE - here is code-sample

from pdfrw import PdfReader, PdfWriter, PageMerge
from PyPDF2 import PdfFileReader, PdfFileWriter


red_file = PdfFileReader(open("file_name.pdf", 'rb'))

large_pages_indexes = [1, 7, 9]

large = PdfFileWriter()
for i in large_pages_indexes:
    p = red_file.getPage(i)
    large.addPage(p)

# here final data have to be written (I would like to avoid that)
with open("virtual_file.pdf", 'wb') as tmp:
  large.write(tmp)

# here I need to read exported "virtual_file.pdf" (I would like to avoid that too)
with open("virtual_file.pdf", 'rb') as tmp:
  pdf = PdfReader(tmp) # here I'm starting to work with this file using another module "pdfrw"
  print(pdf)

To avoid slow disk I/O it appears you want to replace

with open("virtual_file.pdf", 'wb') as tmp:
  large.write(tmp)

with open("virtual_file.pdf", 'rb') as tmp:
  pdf = PdfReader(tmp)

with

buf = io.BytesIO()
large.write(buf)
buf.seek(0)
pdf = PdfReader(buf)

Also, buf.getvalue() is available to you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM