简体繁体 English

是否有 npm package 或 web Z8A5DA52ED126447D359E70C0572A 的特定部分？

[英]Is there a npm package, or web api, for reading specific parts of an image?

原文 2019-10-16 18:18:16 0 1 javascript/ node.js/ image/ parsing/ artificial-intelligence

I'm adding a new function to my node express server that will allow me to upload a drivers ELD daily log and get from that image / pdf the time driven, start time, end time, lunch, etc..我正在向我的节点快速服务器添加一个新的 function，这将允许我上传驱动程序 ELD 每日日志并从该图像/pdf 获取驱动时间、开始时间、结束时间、午餐等。

演示日志

I've looking into converting the pdf into a csv / json / html, but the issue there is that it's an unlabeled mess. I've looking into converting the pdf into a csv / json / html, but the issue there is that it's an unlabeled mess. So I am figuring that trying to somehow read and create a chart similar to the chart already on the eld log.所以我想尝试以某种方式读取并创建一个类似于字段日志中已有图表的图表。

ie. IE。 Reading it would be segmented by say 15 minutes, or however many pixels.读取它会被分割为 15 分钟，或者许多像素。

重点领域

IF line exists in segment call proceed and log data ELSE check segments "SB" "D" "ON" then recursively call

分割的重点领域

In the example shown above, this driver went on duty at 6:45am.在上面显示的示例中，该司机在早上 6:45 上班。

The files are provided in a pdf format, and I am having issues extracting the data and have it be useful / labeled.这些文件以 pdf 格式提供，我在提取数据并使其有用/标记时遇到问题。

UPDATE: Thinking about it a bit more, this solution might be pretty resource costly, especially if done on the server end, ie.更新：再想一想，这个解决方案可能非常耗费资源，特别是如果在服务器端完成，即。 chopping up the image / leaving it in a buffer and reading off it... Maybe it would be better to just try and make sense of the garbage parsing from pdf to something else...切碎图像/将其留在缓冲区中并读取它......也许最好尝试理解从 pdf 到其他东西的垃圾解析......

UPDATE 2: I may try and use Tesseractocr depending on how it outputs data.更新 2：我可能会尝试使用Tesseractocr ，具体取决于它输出数据的方式。

Using on a page like this:在这样的页面上使用：

演示页面 2 ELD 日志

1 个解决方案

I think the term you're looking for is OCR (optical character recognition).我认为您正在寻找的术语是 OCR（光学字符识别）。 That's the name of the technology for converting text on images into actual text to work with.这就是将图像上的文本转换为实际文本以供使用的技术的名称。 Once you have that, decoding the text should be easy if it's in a standard format.一旦你有了它，如果它是标准格式，解码文本应该很容易。 There are plenty of OCR libraries for Node: https://www.npmjs.com/search?q=OCR No need to reinvent the wheel and try to build your own OCR system:) Node 有很多 OCR 库： https://www.npmjs.com/search?q=OCR无需重新发明轮子并尝试构建自己的 OCR 系统:)