简体   繁体   中英

How to find the file name for files generated by pdf2image

I am trying to convert my pdf files to jpg . I first use pdf2image to save the file as a .ppm . Then I want to use PIL to convert the .ppm to .jpg .

How do I find the name of the file that pdf2image saved?

Here is my code:

def to_jpg(just_ids):
    for just_id in just_ids:
        image = convert_from_path('/Users/davidtannenbaum/Desktop/scraped/{}.pdf'.format(just_id), output_folder='/Users/davidtannenbaum/Desktop/scraped/')
        file_name = ?
        im = Image.open("/Users/davidtannenbaum/Desktop/scraped/{}.ppm".format(file_name))
        im.save("/Users/davidtannenbaum/Desktop/scraped/{}.jpg".format(just_id))

You don't need to, the image variable should contain a list of Image objects. You can simply do:

for i, im in enumerate(image):
    im.save("/Users/davidtannenbaum/Desktop/scraped/{}_{}.jpg".format(just_id, i)))

The convert_to_path() method has a few more parameters you can use. You can set the paths_only parameter to True and the format attribute fmt to "jpeg" .

This will directly save your images to your output folder in JPG format instead of PPM and the image variable will contain the relative paths to each image instead of the image objects.

for just_id in just_ids:
        image = convert_from_path('/Users/davidtannenbaum/Desktop/scraped/{}.pdf'.format(just_id), output_folder='/Users/davidtannenbaum/Desktop/scraped/', fmt="jpeg", paths_only=True)
pdf_path = '/path/to/pdf_images/'
output_folder = '/path/for/output/images/'

for pdf in os.listdir(pdf_path):
    filename = pdf.split('.')[0] # prepare your filename 
    pdfs = convert_from_path(os.path.join(pdf_path,pdf),output_folder=output_folder, output_file=os.path.join(output_folder+ filename), fmt="jpeg")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM