简体   繁体   English

VSTO 办公室插件中的 Tesseract

[英]Tesseract in VSTO office addin

I am currently developing an office addin for C# to read image and use tesseract to extract text from image.我目前正在为 C# 开发一个 office 插件来读取图像并使用 tesseract 从图像中提取文本。 However, i am not able to start the tesseract engine.但是,我无法启动 tesseract 引擎。 I tried to put the tessdata,x64,x86 folders in the folder that visual studio creates during debugging.我尝试将 tessdata,x64,x86 文件夹放在 Visual Studio 在调试期间创建的文件夹中。 the path structure looks like the below:路径结构如下所示:

" AppData\\Local\\assembly\\dl3\\randomstring\\randomstring\\randomstring\\randomstring " " AppData\\Local\\assembly\\dl3\\randomstring\\randomstring\\randomstring\\randomstring "

I figure the tesseract engine might need to read the data file from where the dll is but somehow it didnt work.我认为tesseract引擎可能需要从dll所在的位置读取数据文件,但不知何故它不起作用。

Also, I have the required folder in bin\\debug and set to copy to output directoryfolder = Copy always.另外,我在 bin\\debug 中有所需的文件夹,并设置为复制到输出目录文件夹 = 始终复制。

The below is the error screen prompt by excel.以下是excel的错误画面提示。

在此处输入图片说明

the below is the code i tried.下面是我试过的代码。

private string CurrentDirectory()
    {
        Assembly assemblyInfo = Assembly.GetExecutingAssembly();
        string assemblyLocation = assemblyInfo.Location;

        return assemblyLocation;
    }

public string GetText(Bitmap bmp)
        {
            var path = Path.Combine(folder, "tessdata"); //this is the project folder

            var RandomPath = Path.Combine(Path.GetDirectoryName(CurrentDirectory()), "tessdata"); // this is the visual studio created folder.
            
            string recognizedText = string.Empty;
            var engine = new TesseractEngine(RandomPath, "eng", EngineMode.TesseractAndLstm);
            bmp.Save("tempFile.jpeg", System.Drawing.Imaging.ImageFormat.Jpeg);
            // Perform OCR
            using (Pix img = Pix.LoadFromFile("tempFile.jpeg"))
            {
                using (Page recognizedPage = engine.Process(img))
                {
                    recognizedText = recognizedPage.GetText();
                }
            }
            File.Delete("tempFile.jpeg");
            return recognizedText;
}

Any help is greatly appreciated任何帮助是极大的赞赏

It seems assemblies can't be found, the error message refers to the https://github.com/charlesw/tesseract/wiki/error-1 page which describes possible causes.似乎找不到程序集,错误消息是指描述可能原因的https://github.com/charlesw/tesseract/wiki/error-1页面。

You can try using the Assembly Binding Log Viewer which displays details for assembly binds.您可以尝试使用显示程序集绑定详细信息的程序集绑定日志查看器 This information helps you diagnose why the .NET Framework cannot locate an assembly at run time.此信息可帮助您诊断 .NET Framework 无法在运行时定位程序集的原因。 These failures are usually the result of an assembly deployed to the wrong location, a native image that is no longer valid, or a mismatch in version numbers or cultures.这些失败通常是由于程序集部署到错误位置、本机映像不再有效或版本号或文化不匹配造成的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM