Not sure if anyone has encountered this issue, but am getting an outofmemory exception when validating pdf's. Posting here for visibility, if anyone could help that would be awesome.
If anyone has any ideas, please share. At this point I can't really move forward.
Stuff I've tried
Followed suggestions in wiki without success PDFBox faq
Increased max heap size from 2GB to 4GB
Removed jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
Tried using jdk 1.7
My Environment
Java info
java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
JVM Args used
java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
Example pdf
Console Output
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font Symbol
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1531)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.checkGroup(XObjFormValidator.java:138)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:73)
at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:74)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:224)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:81)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
Sample code
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;
import org.apache.pdfbox.preflight.parser.PreflightParser;
public class Validator {
private File file = null;
private List<ValidationError> errorList = new ArrayList<ValidationError>();
public Validator(File file) {
this.file = file;
}
public List<ValidationError> getErrors(){
return errorList;
}
public boolean validate() throws Exception{
PreflightParser parser = null;
PreflightDocument document = null;
ValidationResult result = null;
try {
parser = new PreflightParser(file);
parser.parse();
document = parser.getPreflightDocument();
document.validate();
result = document.getResult();
errorList = result.getErrorsList();
}
catch(Exception e) {
throw e;
}
finally {
if(document != null) {
try {
document.close();
}catch(Exception ignored) {}
}
parser = null;
document = null;
result = null;
}
return errorList.size() > 0 ? true : false;
}
}
When I add these options:
-XX:+HeapDumpOnOutOfMemoryError -Xmx3550m -Xms3550m -Xmn2g
It failed again. And I use VisualVM to analysis the dump heap file. I found something interesting.
And most of char[]'s content is:
//org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess#validateGroupTransparency
protected void validateGroupTransparency(PreflightContext context, PDPage page) throws ValidationException
{
COSBase baseGroup = page.getCOSObject().getItem(XOBJECT_DICTIONARY_KEY_GROUP);
COSDictionary groupDictionary = COSUtils.getAsDictionary(baseGroup, context.getDocument().getDocument());
if (groupDictionary != null)
{
String sVal = groupDictionary.getNameAsString(COSName.S);
if (XOBJECT_DICTIONARY_VALUE_S_TRANSPARENCY.equals(sVal))
{
context.addValidationError(new ValidationError(ERROR_GRAPHIC_TRANSPARENCY_GROUP,
"Group has a transparency S entry or the S entry is null"));
}
}
}
It create a ValidationError object, but the constructor is:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
}
this.cause = cause;
t = new Exception();
}
You can see that, once there is a error, it create the ValidationError and create a StringBuilder.
So, you have three ways to solve the problem:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
String key = errorCode + details;
if (commonDetailMap.containsKey(key)) {
this.details = commonDetailMap.get(key);
} else {
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
commonDetailMap.put(key, this.details);
}
}
this.cause = cause;
t = new Exception();
}
I think using a Map to avoid creating too may StringBuilder would work. But the Map would be too large if the error code and details are multivalued.
So, the another way to change the source code is:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
// invoke intern
this.details = sb.toString().intern();
}
this.cause = cause;
t = new Exception();
}
The intern() is:
Returns a canonical representation for the string object.
I think that using intern() is better.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.