简体   繁体   English

使用 pdfbox 预检 2.0.13 验证 pdf 时出现 java.lang.OutOfMemoryError

[英]java.lang.OutOfMemoryError when validating pdf with pdfbox preflight 2.0.13

PDFBOX-4450 Details on Issue PDFBOX-4450 问题详情

Not sure if anyone has encountered this issue, but am getting an outofmemory exception when validating pdf's.不确定是否有人遇到过这个问题,但在验证 pdf 时遇到内存不足异常。 Posting here for visibility, if anyone could help that would be awesome.在这里发布以提高知名度,如果有人可以提供帮助,那就太棒了。

If anyone has any ideas, please share.如果有人有任何想法,请分享。 At this point I can't really move forward.在这一点上,我真的无法继续前进。

Stuff I've tried我试过的东西

  • Followed suggestions in wiki without success PDFBox faq遵循维基中的建议但没有成功PDFBox 常见问题解答

  • Increased max heap size from 2GB to 4GB最大堆大小从 2GB 增加到 4GB

  • Removed jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider删除了 jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider

  • Tried using jdk 1.7尝试使用 jdk 1.7

  • Used a scratch file (from wiki)使用临时文件(来自维基)
  • Disabled the cache for PDImageXObject (from wiki)禁用 PDImageXObject 的缓存(来自 wiki)

My Environment我的环境

  • Linux 64 bit (arch linux) Linux 64 位 (arch linux)
  • Java 8爪哇 8
  • PDFBox/Preflight ver. PDFBox/预检版。 2.0.13 2.0.13
  • jbig imageio ver. jbig imageio ver. 3.0.2 3.0.2

Java info Java信息

java -version版本

java version "1.8.0_131" java版本“1.8.0_131”

Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Java(TM) SE 运行时环境(构建 1.8.0_131-b11)

Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode) Java HotSpot(TM) 64 位服务器 VM(构建 25.131-b11,混合模式)

JVM Args used使用的 JVM 参数

java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider

Example pdf示例pdf

Pdf from PDFBOX-4450 PDFBOX-4450 中的 PDF

Console Output控制台输出

Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font Symbol
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1531)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.checkGroup(XObjFormValidator.java:138)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:73)
at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:74)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:224)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:81)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)

Sample code示例代码

import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;
import org.apache.pdfbox.preflight.parser.PreflightParser;

public class Validator {
  private File file = null;
  private List<ValidationError> errorList = new ArrayList<ValidationError>();

  public Validator(File file) {
    this.file = file;
  }

  public List<ValidationError> getErrors(){
    return errorList;
  }

  public boolean validate() throws Exception{
    PreflightParser parser = null;
    PreflightDocument document = null;
    ValidationResult result = null;
    try {
      parser = new PreflightParser(file);
      parser.parse();
      document = parser.getPreflightDocument();
      document.validate();
      result = document.getResult();
      errorList = result.getErrorsList();
    }
    catch(Exception e) {
      throw e;
    }
    finally {
      if(document != null) {
        try {
          document.close();
        }catch(Exception ignored) {}
      }
      parser = null;
      document = null;
      result = null;
    }
    return errorList.size() > 0 ? true : false;
  }
}

When I add these options:当我添加这些选项时:

-XX:+HeapDumpOnOutOfMemoryError -Xmx3550m -Xms3550m -Xmn2g 

It failed again.又失败了。 And I use VisualVM to analysis the dump heap file.我使用 VisualVM 来分析转储堆文件。 I found something interesting.我发现了一些有趣的东西。

堆转储文件 And most of char[]'s content is:而大部分 char[] 的内容是:

字符[] 内容 And I find the code in我找到了代码

//org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess#validateGroupTransparency
    protected void validateGroupTransparency(PreflightContext context, PDPage page) throws ValidationException
    {
        COSBase baseGroup = page.getCOSObject().getItem(XOBJECT_DICTIONARY_KEY_GROUP);
        COSDictionary groupDictionary = COSUtils.getAsDictionary(baseGroup, context.getDocument().getDocument());
        if (groupDictionary != null)
        {
            String sVal = groupDictionary.getNameAsString(COSName.S);
            if (XOBJECT_DICTIONARY_VALUE_S_TRANSPARENCY.equals(sVal))
            {
                context.addValidationError(new ValidationError(ERROR_GRAPHIC_TRANSPARENCY_GROUP,
                        "Group has a transparency S entry or the S entry is null"));
            }
        }
    }

It create a ValidationError object, but the constructor is:它创建了一个 ValidationError 对象,但构造函数是:

public ValidationError(String errorCode, String details, Throwable cause)
        {
            this(errorCode);
            if (details != null)
            {
                StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
                sb.append(this.details).append(", ").append(details);
                this.details = sb.toString();
            }
            this.cause = cause;
            t = new Exception();
        }

You can see that, once there is a error, it create the ValidationError and create a StringBuilder.您可以看到,一旦出现错误,它就会创建 ValidationError 并创建一个 StringBuilder。

So, you have three ways to solve the problem:所以,你有三种方法来解决这个问题:

  1. You can extend you heap size.您可以扩展堆大小。 4G is not enough, try 16G or more. 4G不够,试试16G以上。
  2. Don't use PDFBox library.不要使用 PDFBox 库。
  3. Change the PDFBox source code.更改 PDFBox 源代码。
    public ValidationError(String errorCode, String details, Throwable cause)
    {
        this(errorCode);
        if (details != null)
        {
            String key = errorCode + details;
            if (commonDetailMap.containsKey(key)) {
                this.details = commonDetailMap.get(key);
            } else {
                StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
                sb.append(this.details).append(", ").append(details);
                this.details = sb.toString();
                commonDetailMap.put(key, this.details);
            }

        }
        this.cause = cause;
        t = new Exception();
    }

I think using a Map to avoid creating too may StringBuilder would work.我认为使用 Map 来避免创建太可能 StringBuilder 会起作用。 But the Map would be too large if the error code and details are multivalued.但是如果错误代码和详细信息是多值的,则 Map 会太大。

So, the another way to change the source code is:因此,另一种更改源代码的方法是:

    public ValidationError(String errorCode, String details, Throwable cause)
    {
        this(errorCode);
        if (details != null)
        {
            StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
            sb.append(this.details).append(", ").append(details);
            // invoke intern
            this.details = sb.toString().intern();
        }
        this.cause = cause;
        t = new Exception();
    }

The intern() is:实习生()是:

Returns a canonical representation for the string object.

I think that using intern() is better.我认为使用 intern() 更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM