简体   繁体   中英

Convert HTML to searchable PDF using PhantomJS / Node.js

I am generating PDF's server side with an HTML template that gets completed with data from the client and server. The code below works, but:
1) The PDF file is 5x bigger than when it is 'Saved as PDF' on the client side.
2) The PDF is not searchable.

I am assuming both of these problems stem from PhantomJS generating a raster vs. vector based PDF. What should I do differently (hoping I am just missing an PhantomJS option or two...)??

    var phantom = require('phantom');

    req.body['invoicenumber'] = 15010001;
    phantom.create(function(ph){
        ph.createPage(function(page) {
            page.set('paperSize', { format: 'Letter',orientation: 'portrait', margin: '1cm' });
            page.open("html/template.html", function(status) {
                page.evaluate(function(data) {
                    $(function() { populate(data); });
                    },function() {
                        var quotenumber
                        page.render('quotes/'+req.body['invoicenumber']+'.pdf', function(){
                            ph.exit();
                            res.send(req.body['invoicenumber']+'.pdf');
                    });
                },req.body); 
            });     
        });
    })

MINOR UPDATE: Increasing the margin so the page is not scaled up reduces the file size, but still 2.5x the client side 'Save as PDF'...

In the html template try using any of these header tags (h1,h2...h6) to wrap your content. The content inside these headers tags will be rendered as text in the generated pdf. Hence it should be searchable. This will also reduce good amount of pdf file size. Not sure why div, p, table etc tags are rendered as image in the pdf.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM