[英]Borderless pdf extraction to json is not working properly for Python camelot library
Can anyone give me quick answer/help that as we are facing some issue after pdf extraction to json using python camelot is not giving exact content.任何人都可以给我快速回答/帮助,因为我们在使用 python camelot 将 pdf 提取到 json 后面临一些问题没有给出确切的内容。 some content is missing after extraction.
提取后部分内容丢失。
I tried the following code:我尝试了以下代码:
import camelot
pdf_path = '/YOUR/FILEPATH.pdf'
tables = camelot.read_pdf(pdf_path, flavor='stream')
Here are two problems:这里有两个问题:
(cid:71)
...(cid:71)
...flavor='lattice'
, the table isn't detected.flavor='lattice'
,未检测到该表。 Using flavor='stream'
, the table is detected, but the cells aren't properly detected.flavor='stream'
,可以检测到表格,但没有正确检测到单元格。 At the moment, I think that Camelot can't properly extract this table.目前,我认为 Camelot 无法正确提取此表。 They are working on fixing the second problem (see this and this ).
他们正在努力解决第二个问题(见这个和这个)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.