简体   繁体   中英

How to decode base64 data in an image to text?

I was asked this peculiar question today and I couldn't give a straight answer.

I have an image depicting base64 text. How can I convert this to text?

I tried this via pytesseract, but in tesseract is a language component that garbles the text. So I don't think that's a way to go. I tried researching a bit, but seems it's not a fairly common problem (to say the least). I've no clue how it could be useful, but for sure it's vexing!

What other things could I try?

What an interesting question. This task isn't super irregular, however, as I've seen people extract plenty of jumbled words from images before. Extracting a long jumbled line of base64 text could prove to be more challenging. Some OCR tools ive seen used are:

opencv-python wrapper of OpenCV

pytesseract wrapper of Tesseract (As you stated)

More OCR wrappers I found other than the two popular ones: https://pythonrepo.com/repo/kba-awesome-ocr-python-computer-vision

For these to work the image also needs to be fairly good quality. If the base64 image is predictable and in a structured form, you could create your own reference images and compare them to the original also to determine each character in the string and bypass the need for an OCR completely.

There is limitations to OCR obviously such as the fact the image needs scaling, contrast, and alignment, and any small error can ruin the base64 text. I obviously have never seen OCR used for such a thing before so I'm unsure where to go past there, but I am positive you are on the right track!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM