I have a pdf which has math equations like this
I am trying to extract the objective questions from a pdf file and convert them into csv file using python in such a way that each row of table contain a question, four options in each column and a correct option (so total six columns). But that pdf also have those mathematical equations which I can't write them into csv file as they are . Is it possible to write those equations in my csv file as they are in pdf file ?
This depends on how the formula is represented in PDF. It can be either XObject, inline image or unicode text.
Try pdfreader . It can extract plain texts, texts containing PDF commands and images from PDF documents.
from pdfreader import SimplePDFViewer, PageDoesNotExist
fd = open(you_pdf_file_name, "rb")
viewer = SimplePDFViewer(fd)
plain_text = ""
pdf_markdown = ""
images = []
try:
while True:
viewer.render()
pdf_markdown += viewer.canvas.text_content
plain_text += "".join(viewer.canvas.strings)
images.extend(viewer.canvas.inline_images)
images.extend(viewer.canvas.images.values())
viewer.next()
except PageDoesNotExist:
pass
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.