I am trying to take an existing PDF stored on AWS, read it into my backend (Django 1.1, Python 2.7) and add text into the margin. My current code successfully takes in the PDF and adds text to the margin, but it corrupts the PDF:
When opening in the browser:
When opening in Adobe:
I have made my own PDF with/without predefined fonts and with/without images. The ones with predefined fonts and no images work as expected, but with images it throws "There was an error while reading a stream." when opening in Adobe, and just doesn't show the images in the browser. I have come to the conclusion that missing fonts is the reason for the problems with the characters, but I'm not sure why the images aren't showing.
I don't have control over the contents of the PDFs I'm editing so I can't ensure they only use the predefined fonts, and they definitely will need to have images in them. Below is my code
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader
from StringIO import StringIO
class DownloadMIR(APIView):
permission_classes = (permissions.IsAuthenticated,)
def post(self, request, format=None):
data = request.data
file_path = "some_path"
temp_file_path = "some_other_path"
# read your existing PDF
if default_storage.exists(file_path):
existing_pdf = PdfFileReader(default_storage.open(file_path, 'rb'))
else:
raise Http404("could not find pdf")
packet = StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet)
height, width = int(existing_pdf.getPage(0).mediaBox.getUpperRight_x()), int(
existing_pdf.getPage(0).mediaBox.getUpperRight_y())
print("width:" + str(width) + " height: " + str(height))
can.setPageSize([width, height])
can.rotate(90)
footer = "Prepared for " + request.user.first_name + " " + request.user.last_name + " on " + datetime.now().strftime('%Y-%m-%d at %H:%M:%S')
can.setFont("Courier", 8)
can.drawCentredString(width / 2, -15, footer)
can.save()
packet.seek(0)
new_pdf = PdfFileReader(packet)
output = PdfFileWriter()
for index in range(existing_pdf.numPages):
page = existing_pdf.getPage(index)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
#print("done page " + str(index))
response = HttpResponse(content_type="application/pdf")
response['Content-Disposition'] = 'attachment; filename=' + temp_file_path
output.write(response)
return response
Using a script I found online , I see that there are unembedded fonts.
Font List
['/MPDFAA+DejaVuSansCondensed', '/MPDFAA+DejaVuSansCondensed-Bold
', '/MPDFAA+DejaVuSansCondensed-BoldOblique', '/MPDFAA+DejaVuSans
Condensed-Oblique', '/ZapfDingbats']
Unembedded Fonts
set(['/MPDFAA+DejaVuSansCondensed-Bold', '/ZapfDingbats', '/MPDFA
A+DejaVuSansCondensed-BoldOblique', '/MPDFAA+DejaVuSansCondensed'
, '/MPDFAA+DejaVuSansCondensed-Oblique'])
The questions are these - is there a way to extract the embedded font from the original PDF and embed it in the new pdf; and is there something I'm not doing properly which is causing the images to not embed?
After some testing, I discovered the problem wasn't with the generated PDF, rather the returning of the PDF as a response. If I saved my PDF to the bucket and downloaded it from the AWS CLI, it worked. I did not figure out how to fix the response to properly send the PDF back to the front end.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.