How to find the file name for files generated by pdf2image

Question

I am trying to convert my pdf files to jpg . I first use pdf2image to save the file as a .ppm . Then I want to use PIL to convert the .ppm to .jpg .

How do I find the name of the file that pdf2image saved?

Here is my code:

def to_jpg(just_ids):
    for just_id in just_ids:
        image = convert_from_path('/Users/davidtannenbaum/Desktop/scraped/{}.pdf'.format(just_id), output_folder='/Users/davidtannenbaum/Desktop/scraped/')
        file_name = ?
        im = Image.open("/Users/davidtannenbaum/Desktop/scraped/{}.ppm".format(file_name))
        im.save("/Users/davidtannenbaum/Desktop/scraped/{}.jpg".format(just_id))

Answer 1

You don't need to, the image variable should contain a list of Image objects. You can simply do:

for i, im in enumerate(image):
    im.save("/Users/davidtannenbaum/Desktop/scraped/{}_{}.jpg".format(just_id, i)))

Answer 2

The convert_to_path() method has a few more parameters you can use. You can set the paths_only parameter to True and the format attribute fmt to "jpeg" .

This will directly save your images to your output folder in JPG format instead of PPM and the image variable will contain the relative paths to each image instead of the image objects.

for just_id in just_ids:
        image = convert_from_path('/Users/davidtannenbaum/Desktop/scraped/{}.pdf'.format(just_id), output_folder='/Users/davidtannenbaum/Desktop/scraped/', fmt="jpeg", paths_only=True)

Answer 3

pdf_path = '/path/to/pdf_images/'
output_folder = '/path/for/output/images/'

for pdf in os.listdir(pdf_path):
    filename = pdf.split('.')[0] # prepare your filename 
    pdfs = convert_from_path(os.path.join(pdf_path,pdf),output_folder=output_folder, output_file=os.path.join(output_folder+ filename), fmt="jpeg")

How to find the file name for files generated by pdf2image

Question

3 answers

solution1
1 ACCPTED 2019-01-20 20:37:26

solution2
0 2020-05-26 20:16:01

solution3
0 2022-01-07 08:02:50

How to find the file name for files generated by pdf2image

Question

3 answers

solution1 1 ACCPTED 2019-01-20 20:37:26

solution2 0 2020-05-26 20:16:01

solution3 0 2022-01-07 08:02:50

solution1
1 ACCPTED 2019-01-20 20:37:26

solution2
0 2020-05-26 20:16:01

solution3
0 2022-01-07 08:02:50