简体   繁体   中英

Using Tesseract in C#

I have tesseract installed and I am using button click to set location of tesseract.exe file. I am also using another button click to set the location of the image file. Now I want the third button click to process the image with tesseract as I have stored their respective locations. I am using some basic crude approach but it suits me. My code is like:

private void B8_Click(object sender, EventArgs e)
    {
        q = z + "\\" + "output.txt";
        if (k != null)
        {
            Process pr = new Process();
            pr.StartInfo.FileName = j;
            pr.StartInfo.Arguments =  k + " "  + q;
            pr.Start();
            pr.WaitForExit();
        }
        else
        {
            MessageBox.Show("No Image File Selected.");
        }
        var filetext = File.ReadAllText(q);
        tb5.Text = filetext;
        //File.Delete(q);
    }

    private void B10_Click(object sender, EventArgs e)
    {
        openFileDialog1 = new OpenFileDialog();
        DialogResult result = openFileDialog1.ShowDialog();
        if (result == DialogResult.OK)
        {
            j = "\"" + openFileDialog1.FileName + "\"";
            MessageBox.Show("Tesseract Location Set: " + j);


        }
    }

    private void B9_Click(object sender, EventArgs e)
    {
        openFileDialog1 = new OpenFileDialog();
        DialogResult result = openFileDialog1.ShowDialog();
        if (result == DialogResult.OK)
        {
            k = "\"" + openFileDialog1.FileName + "\"";
            MessageBox.Show("Image File Location Set: " + k);
        }
    }

My 3-button click story so far:

I have successfully run the code with 1-button to set the tesseract.exe path, 2-button to set the image path, but the 3-button (see B-8) has an issue. It extracts the text and stores into an "output.txt" file. But, I am not able to import this text into my textbox tb5 and then destroy this file.

The error I am getting is Exception thrown: 'System.IO.FileNotFoundException' in mscorlib.dll An unhandled exception of type 'System.IO.FileNotFoundException' occurred in mscorlib.dll Could not find file 'C:\\Users\\ambij\\Desktop\\try\\output.txt'. I don't understand this but there is actually output.txt file residing in the folder.

The following is for Tesseract 3.05.02 - May work in a later version

private void RunIt()
{
    string tessDataPath = yourTessDataPath; // Your Tesseract Location Set
    string imagePath = yourImagePath; // The Image File Location
    string theTextFromTheImage = DoOCR(yourTessDataPath, yourImagePath);
    // Some formatting may be required - OCR isn't perfect
    MessageBox.Show(theTextFromTheImage);
}

private string DoOCR(string tessdataPath, string filePath)
{
    string returnText = "";
    using (var engine = new TesseractEngine(tessdataPath, "eng", EngineMode.Default))
    {
        // engine.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"); // Only regular letters
        string theVersion = engine.Version; // Optional, but useful
        using (var img = Pix.LoadFromFile(filePath))
        {
            using (var page = engine.Process(img))
            {
                returnText = page.GetText();

            }
        }
    }
    return returnText;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM