简体   繁体   中英

how to run pdfjs in a lambda?

using a pdfjs library to extract text from a pdf (sample code below). how to run this code in lambda with a path as a s3 location. will have to read the file as bytes, first and how to pass it to pdfjs library?

async function getText(path) {
    let doc = await pdfjsLib.getDocument(path).promise;
    let page = await doc.getPage(1);
    let content = await page.getTextContent();
    let text_content = content.items.map(function(item) {
        return item.str;
    });
    return text_content;
}
(async() => {
  await getText('./file.pdf').then(data=> console.log(data));  
})()

It's quite easy.

First you have to authorize your Lambda to access your previously created bucket.

https://aws.amazon.com/premiumsupport/knowledge-center/lambda-execution-role-s3-bucket/?nc1=h_ls

You can use the S3 Javascript SDK. I use Python so I can't give you the exact code, but it should be stratightforward.

https://docs.aws.amazon.com/es_es/sdk-for-javascript/v2/developer-guide/s3-examples.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM