简体   繁体   English

我想使用 nodejs 将 PDF 文件数据转换为 JSON 数据

[英]I want to convert PDF file data to JSON data using nodejs

I want to convert PDF file data in JSON format.我想将 PDF 文件数据转换为 JSON 格式。 I want the output of text in proper JSON format but my code converts normal JSON.我想要正确的 JSON 格式的文本 output,但我的代码转换正常的 JSON。 what is used for this?有什么用? and npm library pdf-parse no give proper formate and pdf2json also.npmpdf-parse也没有给出正确的甲酸盐和pdf2json

var fs=require('fs');
const pdf = require('pdf-parse');
module.exports.simplePdfUpload= (req, res) => {
    upload(req, res, (err) => {
        let dataBuffer = fs.readFileSync(req.files[0].path);  
        pdf(dataBuffer).then(function(data) {
            res.send({"jsondata":data,})
        })
        .catch(function(error){
        })
    })
}

OUTPUT-输出-

{
    'waters including interstate wetlands; (3) all other waters such as ' +
    'intrastate lakes, rivers, streams (including intermittent \nstreams),  ' +
    'mudflats,  sandflats,  wetlands,  sloughs,  prairie  potholes,  wet  ' +
    'meadows,  playa  lakes,  or  natural  ponds,  etc.,  which  the  use, \n' +
    'degradation, or destruction could affect interstate/ foreign commerce; (4) ' +
    'all impoundments of waters otherwise defined as waters of the U. S., \n(5) ' +
    'tributaries of waters identified in 1 through 4 above; (6) the territorial ' +
    'seas; and (7) wetlands adjacent to waters identified in 1 through 6 \n' +
    'above. Only the USACE has the authority to make a final wetlands ' +
    'jurisdictional determination. \n ',
    version: '1.10.100'
} 

But I want to output this type但是我想 output 这个类型

{
    "Info":
    {
        "Company": "ABC",
        "Team": "node"
    },
    "Number of members": 4,
    "Time to finish": "1 day"
}

pdf-parse provide pdf to text only you need to use other library for it. pdf-parse 仅提供 pdf 到文本,您需要使用其他库。

pdf.js-extract pdf.js-extract

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM