C# Possible Memory Leak?

So, I have an app, written in C# (vs2010), performing OCR using the tesseract 3.02 dll and Charles Weld's terreract .net wrapper.

I think I have a memory leak and it seems to be in the area of code where the Pix object is allocated. I am taking a PDF, converting that to a grayscale PNG, then loading that into a Pix object for OCR. When it works, it works really well. Image is large in size (5100 or so pixels in each dim) but not so large in size (only 500k or so).

My code:

Init engine at app startup:

private TesseractEngine engine = new TesseractEngine(@"./tessdata/", "eng+fra", EngineMode.Default);

Method to convert PDF to PNG, then calls:

// Load the image file created earlier into a Pix object.
Pix pixImage = Pix.LoadFromFile(Path.Combine(textBoxSourceFolder.Text, sourceFile));

And then calls the following:

// Perform OCR on the image referenced in the Pix object.
private String PerformImageOCR(Pix pixImage)
    int safety = 0;

            // Deskew the image.
            pixImage = pixImage.Deskew();
            //pixImage.Save(@"c:\temp\img_deskewed.png", ImageFormat.Png); // Debugging - verify image deskewed properly to allow good OCR.

            string text = "";

            // Use the tesseract OCR engine to process the image
            using (var page = engine.Process(pixImage))
                // and then extract the text.
                text = page.GetText();

            return text;
        catch (Exception e)
            MessageBox.Show(string.Format("There was an error performing OCR on image, Retrying.\n\nError:\n{0}", e.Message), "Error", MessageBoxButtons.OK);
    } while (++safety < 3);

    return string.Empty;

I have observed that memory usage jumps by about 31MB when the Pix object is created, then jumps again while OCR is being performed, then finally settles about 33MB higher than before it started. ie: if app, after loading, was consuming 50MB, loading the Pix object causes the memory usage to jump to about 81MB. Performing OCR will see it spike to 114+MB, then, once the process is complete and the results saved, the memory usage settles to about 84MB. Repeating this over many files in a folder will eventually cause the app to barf at 1.5GB or so consumed.

I think my code is okay, but there's something somewhere that's holding onto resources.

The tesseract and leptonica dlls are written in C and I have recompiled them with VS2010 along with the latest or recommended image lib versions, as appropriate. What I'm unsure of, is how to diagnose a memory leak in a C dll from a C# app using visual studio. If I were using Linux, I'd use a tool such as valgrind to help me spot the leak, but my leak sniffing skills on the windows side are sadly lacking. Looking for advice on how to proceed.

I'm not familliar with Tesseract or the wrapper, but for memory profiling issues, if you have Visual Studio 2012/2013, you can use the Performance Wizard. I know it's available in Ultimate, but not sure on other versions.


It's either something in your code or something in the wrapper is not disposing an unmanaged object properly. My guess would be it's in the wrapper. Running the Performance Wizard or another C# memory profiler (like JetBrains DotTrace ) may help you track it down.

Reading your code here I do not see you disposing your Pix pixImage anywhere? That's what is taking up all the resources when you are processing x images. Before you return your string result you should call the dispose method on your pixImage. That should reduce the amount of resources used by your program.

