简体   繁体   中英

Edit pdf file in AWS S3 bucket using iText

AWS S3 bucket has 1 pdf file. The content of this pdf file needs to get edited using iText Java library. The modified file need to stored again in S3 bucket. Currently, we are using AWS Lambda function. Empty pdf file is getting created in destination s3 bucket with Error message in AWS cloudWatch: "Pipe closed"

Lambda java Code:

private String bucketName = "forms-storage";

public String getProposalPdf(InputRequest inputRequest, Context context) throws DocumentException, IOException{

    final BasicAWSCredentials awsCreds = new BasicAWSCredentials(ConstantValues.AccessKey, ConstantValues.SecretKey);
    final AmazonS3Client s3client = (AmazonS3Client) AmazonS3ClientBuilder.standard().withRegion(Regions.AP_SOUTH_1)
                    .withCredentials(new AWSStaticCredentialsProvider(awsCreds)).build();
    S3Object object = s3client.getObject(new GetObjectRequest(bucketName, "forms/COMBO ver 1.1.pdf"));
    InputStream objectData = object.getObjectContent();

    PdfReader reader;
    PdfStamper stamper = null;
    BaseFont bf;

    PipedOutputStream pdfBytes = new PipedOutputStream();

    try {           
        reader = new PdfReader(objectData);
        stamper = new PdfStamper(reader, pdfBytes);

        bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);

        PdfContentByte over = stamper.getOverContent(1);
        over.beginText();
        over.setColorFill(BaseColor.BLACK);
        over.setFontAndSize(bf, 12);
        over.setTextMatrix(120,717);
        over.showText("this is edited text");
        over.endText();

        PipedInputStream inputStream = new PipedInputStream(pdfBytes);

        ObjectMetadata meta = new ObjectMetadata();
        meta= object.getObjectMetadata();
        meta.setContentLength(inputStream.available());         

        s3client.putObject(new PutObjectRequest(bucketName, "forms/123.pdf", inputStream, meta));           

    } catch (IOException e) {
        e.printStackTrace();
    } catch (DocumentException e) {
        e.printStackTrace();
    } 
    finally
    {
        stamper.close();            
        objectData.close();
    }
    return "PDF Created";
}

The problem is not in AWS or iText, but rather it is in the way you are dealing with PipedInputStream and PipedOutputStream .

In particular, most of the valuable data is written to the PDF when stamper.close() is called, but you set the content length meta.setContentLength(inputStream.available()); before closing the stamper, thus the length is invalid. After you call putObject , the inputStream instance is closed (check internal closedByReader field), but pdfBytes remains connected to it and cannot write to it after inputStream is closed, so when stamper.close(); is called, you get an exception because you are not able to write to inputStream anymore.

I don't think any attempts to fix this problem in the current approach will suffice because in the documentation it is clearly stated that

Typically, data is read from a PipedInputStream object by one thread and data is written to the corresponding PipedOutputStream by some other thread. Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread .

So one solution would be, although not so memory efficient, to use ByteArrayOutputStream and ByteArrayInputStream :

ByteArrayOutputStream pdfBytes = new ByteArrayOutputStream();

try {
    reader = new PdfReader(objectData);
    stamper = new PdfStamper(reader, pdfBytes);

    bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);

    PdfContentByte over = stamper.getOverContent(1);
    over.beginText();
    over.setColorFill(BaseColor.BLACK);
    over.setFontAndSize(bf, 12);
    over.setTextMatrix(120,717);
    over.showText("this is edited text");
    over.endText();

    stamper.close();
    objectData.close();

    ObjectMetadata meta = new ObjectMetadata();
    meta= object.getObjectMetadata();
    ByteArrayInputStream inputStream = new ByteArrayInputStream(pdfBytes.toByteArray());
    meta.setContentLength(inputStream.available());

    s3client.putObject(new PutObjectRequest(bucketName, "forms/123.pdf", inputStream, meta));      

} catch (IOException e) {
    e.printStackTrace();
} catch (DocumentException e) {
    e.printStackTrace();
}

Typically PDFs are not so huge in size so that you can allow yourself to store them in memory. If you want to optimize memory consumption you should do the PDF processing in a separate thread. I recommend checking this article or search for generic examples of using PipedInputStream with PipedOutputStream .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM