简体   繁体   English

以“无损”方式将jpg图像存储到pdf文件中

[英]Storing jpg images into a pdf file in a "lossless" way

Given a directory with several jpg files (photos), I would like to create a single pdf file with one photo per page.给定一个包含多个 jpg 文件(照片)的目录,我想创建一个 pdf 文件,每页一张照片。 However, I would like the photos to be stored in the pdf file unchanged;但是,我希望将照片原封不动地存储在 pdf 文件中; ie, I would like to avoid decoding and recoding.即,我想避免解码和重新编码。 So ideally I would like to be able to extract the original jpg files (maybe minus the metadata) from the pdf file, using, eg, a linux command line too like pdfimages .所以理想情况下,我希望能够从 pdf 文件中提取原始 jpg 文件(可能减去元数据),使用例如 linux 命令行太像pdfimages

My ideas so far:到目前为止我的想法:

  • imagemagick convert . imagemagick convert However, I am confused by the compression options: If I choose 100% quality , does it mean that the jpg is internally decoded, and then encoded lossless?但是,我对压缩选项感到困惑:如果我选择 100% quality ,是否意味着 jpg 在内部解码,然后进行无损编码? (Which is obviously not what I want?) (这显然不是我想要的?)
  • pdflatex . pdflatex Some people claim that the graphics package includes images lossless, while other dispute that . 有人声称图形 package 包含无损图像,而其他人则对此表示异议 In any case, pdflatex would be slightly more cumbersome (I would first have to find out the dimensions of the photos, then set the page size accordingly, make sure that ther are no margins, headers etc etc).无论如何,pdflatex 会稍微麻烦一些(我首先必须找出照片的尺寸,然后相应地设置页面大小,确保没有边距、标题等)。

img2pdf

Losslessly convert raster images to PDF without re-encoding PNG, JPEG, and JPEG2000 images.无损地将光栅图像转换为 PDF,无需重新编码 PNG、JPEG 和 JPEG2000 图像。 This leads to a lossless conversion of PNG, JPEG and JPEG2000 images with the only added file size coming from the PDF container itself.这导致 PNG、JPEG 和 JPEG2000 图像的无损转换,唯一增加的文件大小来自 PDF 容器本身。 Other raster graphics formats are losslessly stored using the same encoding that PNG uses.其他光栅图形格式使用与 PNG 相同的编码无损存储。 Since PDF does not support images with transparency and since img2pdf aims to never be lossy, input images with an alpha channel are not supported.由于 PDF 不支持具有透明度的图像,并且 img2pdf 旨在永不丢失,因此不支持具有 alpha 通道的输入图像。

( pdfimages -all does the exact opposite.) pdfimages -all正好相反。)

You could use the following small script which relies on HexaPDF (note: I'm the author of HexaPDF) to do this.您可以使用以下依赖于HexaPDF 的小脚本(注意:我是 HexaPDF 的作者)来执行此操作。

Note: Make sure you have Ruby 2.4 installed, then run gem install hexapdf to install hexapdf.注意:确保你已经安装了 Ruby 2.4,然后运行gem install hexapdf来安装 hexapdf。

Here is the script:这是脚本:

require 'hexapdf'

doc = HexaPDF::Document.new

ARGV.each do |image_file|
  image = doc.images.add(image_file)
  page = doc.pages.add
  iw = image.info.width.to_f
  ih = image.info.height.to_f                                                                                                                             
  pw = page.box(:media).width.to_f
  ph = page.box(:media).height.to_f
  rw, rh = pw / iw, ph / ih
  ratio = [rw, rh].min
  iw, ih = iw * ratio, ih * ratio
  x, y = (pw - iw) / 2, (ph - ih) / 2
  page.canvas.image(image, at: [x, y], width: iw, height: ih)
end

doc.write('images.pdf')

Just supply the images as arguments on the command line, the output file will be named images.pdf .只需在命令行上提供图像作为参数,输出文件将命名为images.pdf Most of the code deals with centering and scaling the images to nicely fit onto the pages.大多数代码处理居中和缩放图像以很好地适应页面。

Another possibility for storing jpg images into a pdf file in a "lossless" way is provided by PoDoFo : PoDoFo 提供了另一种以“无损”方式将 jpg 图像存储到pdf文件的可能性:

podofoimg2pdf is able to perform lossless conversion from JPEG to PDF by embedding the jpg file into the pdf container. podofoimg2pdf能够通过将 jpg 文件嵌入到 pdf 容器中来执行从 JPEG 到 PDF 的无损转换。

podofoimg2pdf
Usage: podofoimg2pdf [output.pdf] [-useimgsize] [image1 image2 image3 ...]

Options:
 -useimgsize    Use the imagesize as page size, instead of A4

Depending on what you wish to do with the files, on windows, if the images are simpler jpeg/gif/tif/png you can store in a cbz, zip, folder or zipped folder and view with SumatraPDF which has the SaveAs PDF option thus all done with one exe.根据您希望对文件执行的操作,在 windows 上,如果图像更简单 jpeg/gif/tif/png,您可以存储在 cbz、zip、文件夹或压缩文件夹中,然后使用具有 SaveAs PDF 选项的 SumatraPDF 查看全部由一个 exe 完成。

在此处输入图像描述

It will fail with files that are viewable but not acceptable as PDF inputs such as webp or heic, so check in the viewer what the filename extension is before.对于可查看但不可接受的 PDF 输入文件(例如 webp 或 heic),它将失败,因此请在查看器中检查文件扩展名之前的内容。

It should in practically all cases be lossless, however you should roundtrip with pdfimage -all to do a file compare between input and output to check there was no need to convert any bytes.它在几乎所有情况下都应该是无损的,但是您应该使用 pdfimage -all 进行往返以在输入和 output 之间进行文件比较以检查是否需要转换任何字节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM