简体   繁体   中英

Looking for a [maybe] design pattern

Here's my scenario. User selects a document in my software and my software extracts some key data out of the document. The software handles two kinds of formats; PDF and DOCX. For each of these types, there are several templates and the uploaded document is supposed to belong to one of these templates. I don't know if this is a well-known problem and if there exists an established design pattern to solve this scenario (that's why I'm on SO). Here's what I have designed so far:

Since each template has specific structure/contents, I'm thinking of creating separate classes for each template. There will be a top-level interface called IExtractor, then there will be two top-level classes called PdfExtractor and DocxExtractor, each implementing the IExtractor interface. Any functionality common to all PDF (or DOCX) templates will go into these parent classes.

Below these two parent classes, there will be several template-classes, one for each template. For example a class called Template571_PdfExtractor that inherits from PdfExtractor, has methods specific to Template 571, but provides results in the same form as any other extractor.

I'm using C# 4.0 if that matters. Here's the skeleton:

The interface:

interface IExtractor
{
void ExtractDocument(System.IO.FileInfo document, dsExtract dsToFill);
}

The two parent classes:

public class DocxExtractor : IExtractor 
{
    public virtual void ExtractDocument(System.IO.FileInfo document, dsExtract dsToFill)
    {
    }
}

public class PdfExtractor : IExtractor 
{
    public virtual void ExtractDocument(System.IO.FileInfo document, dsExtract dsToFill)
    {
    }
}

One of the concrete classes:

public class Template571_PdfExtractor : PdfExtractor
{
    public virtual void ExtractDocument(System.IO.FileInfo document, dsExtract dsToFill)
    {
    }
}

Now there are a few key questions I'm not sure about. All of them revolve around the problem that I don't know how and where to instantiate the concrete (template) class's object. I can use file extension to decide whether I need to go down the PdfExtractor tree node or DocxExtractor node. After that, it is the file's contents that tells me the template to which user's document belongs. So where do I put this "decision" code? My idea was to put it in the PdfExtractor class (or DocxExtractor for that matter). Is that the correct way?

Sorry I got a bit long, but I didn't know how to fully describe my situation. Thanks for your ideas.

Shujaat

Once you dig deeper into design patterns and such you'll surely find out that most of the time there is no one correct way to implement something...

One possible way would be to create so-called factory classes: One for PdfExtractors, and another one for DocXExtractors. Each factory class would probably have a single static method like

public final class PdfExtractorFactory {
   public static PdfExtractor getExtractor(String filename) { ... }

   ... // constructor, or singleton getter here
}

The logic to decide upon the concrete subclass of the PdfExtractor instance to return (ie, which template to use) would than reside in the factory method. This way, neither the abstract base class PdfExtractor nor its subclasses would be cluttered with this decision logic. Only the factory classes would need to know about the subclasses of PdfExtractor (resp. DocXExtractor), and the rest of your code would be totally unaware of the concrete subclasses since the factories pass on instances of the superclasses.

Since you're likely to need only a single instance of PdfExtractorFactory and DocXExtractorFactory, you might choose to implement these factory classes as singletons.

Update : Of course you can use either a static factory method or the Singleton pattern and a non-static factory method (but you don't need both).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM