jPod是否通过数据流合并PDF？

Question

I am using jPod to Merge my PDF Documents. 我正在使用jPod合并我的PDF文档。 I merged 400 PDFs of each 20 Pages resulting in file of 190 MB, whereas the size of a single pdf is 38 KB. 我合并了每20页400个PDF，结果文件为190 MB，而单个pdf的大小为38 KB。 I checked for heap status in my IDE. 我在IDE中检查了堆状态。 I didn't get any Out of Memory Error. 我没有任何内存不足错误。 I ran the same in Apache Tomcat with almost 30 Clients. 我在拥有30个客户端的Apache Tomcat中运行了相同的程序。 My Tomcat stopped serving the requests. 我的Tomcat停止处理请求。 Is it because, jPod doesn't use Streaming Or due to some other reasons?? 是因为jPod不使用Streaming还是由于其他原因？

private void run() throws Throwable {
String sOutFileFullPathAndName = "/Users/test/Downloads/" + UUID.randomUUID().toString().replace("-", "");
PDDocument dstDocument = PDDocument.createNew();

for (int i = 0;i < 400; i++) {
    //System.out.println(Runtime.getRuntime().freeMemory());
    PDDocument srcDocument = PDDocument.createFromLocator(new FileLocator("/Users/test/Downloads/2.pdf") );   
    mergeDocuments(dstDocument, srcDocument);
}
FileLocator destinationLocator = new FileLocator(sOutFileFullPathAndName);
dstDocument.save(destinationLocator, null);
dstDocument.close();
}

private void mergeDocuments(PDDocument dstDocument, PDDocument srcDocument) {
PDPageTree pageTree = srcDocument.getPageTree();
int pageCount = pageTree.getCount();
for (int index = 0; index < pageCount; index++) {
    PDPage srcPage = pageTree.getPageAt( index );
    appendPage(dstDocument, srcPage);

    srcPage = null;
}
}

private void appendPage(PDDocument document, PDPage page) {
PDResources srcResources = page.getResources();
CSContent cSContent = page.getContentStream();
PDPage newPage = (PDPage) PDPage.META.createNew();

// copy resources from source page to the newly created page

PDResources newResources = (PDResources) PDResources.META
    .createFromCos(srcResources.cosGetObject().copyDeep());
newPage.setResources(newResources);
newPage.setContentStream(cSContent);

// add that new page to the destination document

document.addPageNode(newPage);
}

Answer 1

PDF is not simply a "stream" of page data. PDF不仅仅是页面数据的“流”。 It is a complex data structure containing objects referencing each other. 它是一个复杂的数据结构，其中包含相互引用的对象。 In this concrete case page trees/nodes, content streams, resources,... 在这种具体情况下，页面树/节点，内容流，资源，...

jPod keeps persistent object in memory using weak references only - they can always be refreshed from the random access data. jPod仅使用弱引用将持久对象保存在内存中-始终可以从随机访问数据中刷新它们。 If you start updating the object structure, objects get "locked" in memory, simply because the change is not persistent and cannot longer be refreshed. 如果开始更新对象结构，对象将被“锁定”在内存中，仅仅是因为更改不是持久的并且无法再刷新。

Making lots of changes without peridodically saving the result will keep the complete structure in memory - i assume that's your problem here. 在不定期保存结果的情况下进行大量更改将使完整的结构保留在内存中-我认为这是您的问题。 Saving every now and then should reduce memory footprint. 时不时地保存将减少内存占用。

In addition, this algorithm will create a poor page tree, containing in a linear array with thousands of pages. 另外，此算法将创建不良页树，该页树包含成千上万页的线性数组。 You should try to create a balanced tree structure. 您应该尝试创建平衡的树结构。 Another point for optimization is resource handling. 优化的另一点是资源处理。 Merging resources like fonts or images may dramatically reduce target size. 合并字体或图像之类的资源可能会大大减小目标大小。

jPod是否通过数据流合并PDF？

问题描述

1 个解决方案

解决方案1
0 2018-07-26 11:33:54

jPod是否通过数据流合并PDF？

问题描述

1 个解决方案

解决方案1 0 2018-07-26 11:33:54

解决方案1
0 2018-07-26 11:33:54