简体   繁体   English

使用C ++,libpng和OpenMP并行化PNG文件创建

[英]Parallelization of PNG file creation with C++, libpng and OpenMP

I am currently trying to implement a PNG encoder in C++ based on libpng that uses OpenMP to speed up the compression process. 我目前正在尝试在C ++中实现基于libpng的PNG编码器,该编码器使用OpenMP来加速压缩过程。 The tool is already able to generate PNG files from various image formats. 该工具已经能够从各种图像格式生成PNG文件。 I uploaded the complete source code to pastebin.com so you can see what I have done so far: http://pastebin.com/8wiFzcgV 我将完整的源代码上传到pastebin.com,以便您可以看到我到目前为止所做的工作: http//pastebin.com/8wiFzcgV

So far, so good! 到现在为止还挺好! Now, my problem is to find a way how to parallelize the generation of the IDAT chunks containing the compressed image data. 现在,我的问题是找到一种方法如何并行化包含压缩图像数据的IDAT块的生成。 Usually, the libpng function png_write_row gets called in a for-loop with a pointer to the struct that contains all the information about the PNG file and a row pointer with the pixel data of a single image row. 通常,libpng函数png_write_row在for循环中被调用,该for循环具有指向结构的指针,该结构包含关于PNG文件的所有信息和具有单个图像行的像素数据的行指针。

(Line 114-117 in the Pastebin file) (Pastebin文件中的第114-117行)

//Loop through image
for (i = 0, rp = info_ptr->row_pointers; i < png_ptr->height; i++, rp++) {
    png_write_row(png_ptr, *rp);
}

Libpng then compresses one row after another and fills an internal buffer with the compressed data. 然后,Libpng压缩一行接一行,并用压缩数据填充内部缓冲区。 As soon as the buffer is full, the compressed data gets flushed in a IDAT chunk to the image file. 一旦缓冲区已满,压缩数据就会在IDAT块中刷新到映像文件。

My approach was to split the image into multiple parts and let one thread compress row 1 to 10 and another thread 11 to 20 and so on. 我的方法是将图像分成多个部分,让一个线程压缩第1行到第10行,另一个线程压缩11到20,依此类推。 But as libpng is using an internal buffer it is not as easy as I thought first :) I somehow have to make libpng write the compressed data to a separate buffer for every thread. 但是由于libpng正在使用内部缓冲区,它并不像我想象的那么容易:)我不得不让libpng将压缩数据写入每个线程的单独缓冲区。 Afterwards I need a way to concatenate the buffers in the right order so I can write them all together to the output image file. 之后我需要一种方法以正确的顺序连接缓冲区,这样我就可以将它们一起写入输出图像文件。

So, does someone have an idea how I can do this with OpenMP and some tweaking to libpng? 那么,是否有人知道我如何使用OpenMP和一些调整到libpng? Thank you very much! 非常感谢你!

This is too long for a comment but is not really an answer either-- 这个评论太长了,但也不是真正的答案 -

I'm not sure you can do this without modifying libpng (or writing your own encoder). 如果不修改libpng(或编写自己的编码器),我不确定你能做到这一点。 In any case, it will help if you understand how PNG compression is implemented: 在任何情况下,如果您了解如何实现PNG压缩,它将有所帮助:

At the high level, the image is a set of rows of pixels (generally 32-bit values representing RGBA tuples). 在高级别,图像是一组像素行(通常是表示RGBA元组的32位值)。

Each row can independently have a filter applied to it -- the filter's sole purpose is to make the row more "compressible". 每行可以独立地具有过滤器适用于它-过滤器的唯一目的是使该行更“压缩”。 For example, the "sub" filter makes each pixel's value the difference between it and the one to its left. 例如,“sub”过滤器使每个像素的值与它与左边的值之间的差值。 This delta encoding might seem silly at first glance, but if the colours between adjacent pixels are similar (which tends to be the case) then the resulting values are very small regardless of the actual colours they represent. 这种delta编码乍一看似乎很傻,但如果相邻像素之间的颜色相似(往往是这种情况),那么无论它们所代表的实际颜色如何,结果值都非常小。 It's easier to compress such data because it's much more repetitive. 压缩这些数据更容易,因为它更重复。

Going down a level, the image data can be seen as a stream of bytes (rows are no longer distinguished from each other). 沿着一个级别,图像数据可以看作是一个字节流(行不再相互区分)。 These bytes are compressed, yielding another stream of bytes. 这些字节被压缩,产生另一个字节流。 The compressed data is arbitrarily broken up into segments (anywhere you want!) written to one IDAT chunk each (along with a little bookkeeping overhead per chunk, including a CRC checksum). 压缩数据被任意分解为每个写入一个IDAT块的段(任何你想要的!)(每个块有一点书记开销,包括CRC校验和)。

The lowest level brings us to the interesting part, which is the compression step itself. 最低级别将我们带到了有趣的部分,即压缩步骤本身。 The PNG format uses the zlib compressed data format. PNG格式使用zlib压缩数据格式。 zlib itself is just a wrapper (with more bookkeeping, including an Adler-32 checksum) around the real compressed data format, deflate (zip files use this too). zlib本身只是一个包装器(包含更多的簿记,包括Adler-32校验和)围绕真正的压缩数据格式, deflate (zip文件也使用它)。 deflate supports two compression techniques: Huffman coding (which reduces the number of bits required to represent some byte-string to the optimal number given the frequency that each different byte occurs in the string), and LZ77 encoding (which lets duplicate strings that have already occurred be referenced instead of written to the output twice). deflate支持两种压缩技术:霍夫曼编码(根据字符串中每个不同字节出现的频率,将一些字节串表示为最佳数量所需的位数)和LZ77编码(允许已经重复的字符串)发生了被引用而不是写入输出两次)。

The tricky part about parallelizing deflate compression is that in general, compressing one part of the input stream requires that the previous part also be available in case it needs to be referenced. 关于并行化压缩压缩的棘手部分是,通常,压缩输入流的一部分要求前一部分在需要被引用时也可用。 But , just like PNGs can have multiple IDAT chunks, deflate is broken up into multiple "blocks". 但是 ,就像PNG可以有多个IDAT块一样,deflate被分解为多个“块”。 Data in one block can reference previously encoded data in another block, but it doesn't have to (of course, it may affect the compression ratio if it doesn't). 在一个块中的数据可以在另一个块参考先前编码的数据,但它不必 (当然,它可能,如果它不影响压缩比)。

So, a general strategy for parallelizing deflate would be to break the input into multiple large sections (so that the compression ratio stays high), compress each section into a series of blocks, then glue the blocks together (this is actually tricky since blocks don't always end on a byte boundary -- but you can put an empty non-compressed block (type 00), which will align to a byte boundary, in-between sections). 因此,并行化deflate的一般策略是将输入分成多个大部分 (以便压缩比保持很高),将每个部分压缩成一系列块,然后将块粘合在一起(这实际上很棘手,因为块不是总是以字节边界结束 - 但是你可以放置一个空的非压缩块(类型00),它将对齐一个字节边界,在两个部分之间)。 This isn't trivial, however, and requires control over the very lowest level of compression (creating deflate blocks manually), creating the proper zlib wrapper spanning all the blocks, and stuffing all this into IDAT chunks. 然而,这并非易事,并且需要控制最低级别的压缩(手动创建deflate块),创建跨越所有块的正确zlib包装器,并将所有这些填充到IDAT块中。

If you want to go with your own implementation, I'd suggest reading my own zlib/deflate implementation (and how I use it ) which I expressly created for compressing PNGs (it's written in Haxe for Flash but should be comparatively easy to port to C++). 如果你想使用自己的实现,我建议你阅读我自己的zlib / deflate实现 (以及我如何使用它 ),这是我为压缩PNG而明确创建的(它是用Haxe for Flash编写的,但应该相对容易移植到C ++)。 Since Flash is single-threaded, I don't do any parallelization, but I do split the encoding up into virtually independent sections ("virtually" because there's the fractional-byte state preserved between sections) over multiple frames, which amounts to largely the same thing. 由于Flash是单线程的,我不进行任何并行化,但是我确实将编码分成几个独立的部分(“虚拟”,因为在部分之间保留了部分字节状态),这在多个帧中很大程度上是一样。

Good luck! 祝好运!

I finally got it to parallelize the compression process. 我终于得到它来并行化压缩过程。 As mentioned by Cameron in the comment to his answer I had to strip the zlib header from the zstreams to combine them. 正如Cameron在对他的回答的评论中所提到的,我不得不从zstreams中删除zlib标题以组合它们。 Stripping the footer was not required as zlib offers an option called Z_SYNC_FLUSH which can be used for all chunks (except the last one which has to be written with Z_FINISH) to write to a byte boundary. 由于zlib提供了一个名为Z_SYNC_FLUSH的选项,它可以用于所有块(除了必须用Z_FINISH写入的最后一个块)以写入字节边界,因此不需要剥离页脚。 So you can simply concatenate the stream outputs afterwards. 因此,您可以简单地连接流输出。 Eventually, the adler32 checksum has to be calculated over all threads and copied to the end of the combined zstreams. 最终,必须在所有线程上计算adler32校验和,并将其复制到组合zstream的末尾。

If you are interested in the result you can find the complete proof of concept at https://github.com/anvio/png-parallel 如果您对结果感兴趣,可以在https://github.com/anvio/png-parallel找到完整的概念证明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM