简体   繁体   中英

Extracting coordinates of text using tesseract python without using pytesseract

i did not found any pytesseract alternative wrapper not for windows system. I want to extract the text with its coordinates without using pytesseract in pandas dataframe.

tesseract_path is the path where your tesseract is installed in windows system.

img_path is the path of the image from which we want to extract text.

tsv_path is the output path of file in which the extracted info is stored eg ../path/sample_output

import os

tesseract_cmd = '"%s" %s %s -l eng --psm 6 tsv'%(tesseract_path, img_path, tsv_path)

os.system('cmd /c %s'%(tesseract_cmd))
import pandas as pd

df = pd.read_csv('%s.tsv'%(tsv_path), sep='\t', header=0)

references: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM