简体   繁体   中英

ghostscript or python : how to combine pdf of different page sizes into a pdf of same page sizes?

I searched the stackoverflow for the problem. The nearest link is:
How to set custom page size with Ghostscript
How to convert multiple, different-sized PostScript files to a single PDF?

But this could NOT solve my problem.

The question is plain simple.
How can we combine multiple pdf (with different page sizes) into a combined pdf which have all the pages of same size.

Example:
two input pdfs are:
hw1.pdf with single page of size 5.43x3.26 inch (found from adobe reader)
hw6.pdf with single page of size 5.43x6.51 inch

The pdfs can be found here:
https://github.com/bhishanpdl/Questions

The code is:

gs -sDEVICE=pdfwrite -r720 -g2347x3909 -dPDFFitPage -o homeworks.pdf hw1.pdf hw6.pdf

PROBLEM: First pdf is portrait, and second page is landscape.
QUESTION: How can we make both pages portrait ?

NOTE :
-r720 is pixels/inch.
The size -g2347x3909 is found using python script:

wd = int(np.floor(720 * 5.43))
ht = int(np.floor(720 * 3.26))    

gsize = '-g' + str(ht) + 'x' + str(wd) + ' '
# this gives:  gsize = -g4308x6066

Another Attempt

commands = 'gs -o homeworks.pdf -sDEVICE=pdfwrite -dDEVICEWIDTHPOINTS=674 ' +\
               ' -dDEVICEHEIGHTPOINTS=912 -dPDFFitPage ' +\
               'hw1.pdf hw6.pdf'
subprocess.call(commands, shell=1)

This gives first both pages portrait, but they do not have the same size.
First page is smaller is size, and second is full when I open the output in adobe reader.
In general, how can we make size of all the pages same?

The reason (in the first example) that one of the pages is rotated is because it fits better that way round. Because Ghostscript is primarily intended as print software, the assumption is that you want to print the input. If the output is to fixed media size, page fitting is requested, and the requested media size fits better (ie with less scaling) when rotated, then the content will be rotated.

In order to prevent that, you would need to rewrite the FitPage procedure, which is defined in /ghostpdl/Resource/Init/pdf_main.ps in the procedure pdf_PDF2PS_matrix . You can modify that procedure so that it does not rotate the page for a better fit.

In the second case you haven't set -dFIXEDMEDIA ( -g implies -dFIXEDMEDIA , -dDEVICE...POINTS does not), so the media size requests in the PDF files will override the media size you set on the command line. Which is why the pages are not resized. Since the media is then the size requested by the PDF file, the page will fit without modification, thus -dPDFFitPage will do nothing. So you need to set -dFIXEDMEDIA if you use -dDEVICE...POINTS and any of the FitPage switches.

You would be better advised (as your second attempt) to use -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS to set the media size, since these are not dependent on the resolution (unlike -g ) which can be overridden by PostScript input programs. You should not meddle with the resolution without a good reason, so don't set -r720 .

Please be aware that this process does not 'merge', 'combine' or anything else which implies that the content of the input is unchanged in the output. You should read the documentation on the subject and understand the process before attempting to use this procedure.

You have tagged this question "ghostscript" but I assume by your use of subprocess.call() that you are not averse to using Python.

The pagemerge canvas of the pdfrw Python library can do this. There are some examples of dealing with different sized pages in the examples directory and at the source of pagemerge.py. The fancy_watermark.py shows an example of dealing with different page sizes, in the context of applying watermarks.

pdfrw can rotate, scale, or simply position source pages on the output. If you want rotation or scaling, you can look in the examples directory. (Since this is for homework, for extra credit you can control the scaling and rotation by looking at the various page sizes. :) But if all you want is the second page to be extended to be as long as the first, you could do that with this bit of code:

from pdfrw import PdfReader, PdfWriter, PageMerge

pages = PdfReader('hw1.pdf').pages + PdfReader('hw6.pdf').pages
output = PdfWriter()

rects = [[float(num) for num in page.MediaBox] for page in pages] 
height = max(x[3] - x[1] for x in rects)
width = max(x[2] - x[0] for x in rects)

mbox = [0, 0, width, height]

for page in pages:
    newpage = PageMerge()
    newpage.mbox = mbox              # Set boundaries of output page
    newpage.add(page)                # Add one old page to new page
    image = newpage[0]               # Get image of old page (first item)
    image.x = (width - image.w) / 2  # Center old page left/right
    image.y = (height - image.h)     # Move old page to top of output page
    output.addpage(newpage.render())

output.write('homeworks.pdf')

(Disclaimer: I am the primary pdfrw author.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM