简体繁体 English

在C＃中推广和聚合XML转储的最佳方法是什么？

[英]What is the best approach to generalize and aggregate XML dumps in C#?

原文 2010-12-17 02:08:06 8 6 c#/ generics/ linq-to-xml/ design-patterns

Here is the business part of the issue: 以下是该问题的业务部分：

Several different companies send a XML dump of the information to be processed. 几个不同的公司发送要处理的信息的XML转储。
The information sent by the companies are similar ... not exactly same. 公司发送的信息类似......不完全相同。
Several more companies would be soon enlisted and would start sending information 很多公司很快就会入伍，并开始发送信息

Now, the technical part of the problem is I want to write a generic solution in C# to accommodate this information for processing. 现在，问题的技术部分是我想在C＃中编写一个通用的解决方案来容纳这些信息进行处理。 I would be transforming the XML in my C# class(es) to fit in to my database model. 我将在我的C＃类中转换XML以适应我的数据库模型。

Is there any pattern or solution for this issue to be handled generically without needing to change my solution in case of addition of many companies later? 是否有任何模式或解决方案可以一般性地处理这个问题，而不需要在以后添加许多公司时更改我的解决方案？

What would be the best approach to write my parser/transformer? 编写解析器/转换器的最佳方法是什么？

6 个解决方案

This is how I have done something similar in the past. 这就是我过去做过类似事情的方式。

As long as each company has its own fixed format which they use for their XML dump, 只要每个公司都有自己的固定格式，用于XML转储，

Have an specific XSLT for each company. 为每个公司准备一个特定的XSLT。
Have a way of indicating which dump is sourced from where (maybe different DUMP folders for each company ) 有一种方法可以指示从哪里获取转储（每个公司可能有不同的DUMP文件夹）
In your program, based on 2, select 1 and apply it to the DUMP 在您的程序中，基于2，选择1并将其应用于DUMP
All the XSLT's will transform the XML to your one standard database schema 所有XSLT都将XML转换为您的一个标准数据库模式
Save this to your DB 将其保存到您的数据库

Each new company addition is at the most a new XSLT In cases where the schema is very similar, the XSLT's can be just re-used and then specific changes made to them. 每个新公司的添加最多都是一个新的XSLT。在模式非常相似的情况下，可以重新使用XSLT，然后对它们进行特定的更改。

Drawback to this approach: Debugging XSLT's can be a bit more painful if you do not have the right tools. 回顾这种方法：如果你没有合适的工具，调试XSLT可能会有点痛苦。 However a LOT of XML Editors (eg XML Spy etc) have excellent XSLT debugging capabilities. 然而，许多XML编辑器（例如XML Spy等）具有出色的XSLT调试功能。

Sounds to me like you are just asking for a design pattern (or set of patterns) that you could use to do this in a generic, future-proof manner, right? 听起来像你只是要求一个设计模式（或一组模式），你可以用一个通用的，面向未来的方式来做到这一点，对吧？

Ideally some of the attributes that you probably want 理想情况下，您可能想要的一些属性

Each "transformer" is decoupled from one another. 每个“变压器”彼此分离。
You can easily add new "transformers" without having to rewrite your main "driver" routine. 您可以轻松添加新的“变形金刚”而无需重写主“驱动程序”例程。
You don't need to recompile / redeploy your entire solution every time you modify a transformer, or at least add a new one. 每次修改变压器或至少添加新变压器时，都不需要重新编译/重新部署整个解决方案。

Each "transformer" should ideally implement a common interface that your driver routine knows about - call it IXmlTransformer. 理想情况下，每个“变换器”应实现驱动程序例程所知的公共接口 - 称之为IXmlTransformer。 The responsibility of this interface is to take in an XML file and to return whatever object model / dataset that you use to save to the database. 此接口的职责是接收XML文件并返回用于保存到数据库的任何对象模型/数据集。 Each of your transformers would implement this interface. 您的每个变换器都将实现此接口。 For common logic that is shared by all transformers you could either create a based class that all inherit from, or (my preferred choice) have a set of helper methods which you can call from any of them. 对于所有变换器共享的通用逻辑，您可以创建一个所有继承的基类，或者（我的首选）具有一组可以从其中任何一个调用的辅助方法。

I would start by using a Factory to create each "transformer" from your main driver routine. 我首先使用Factory从主驱动程序中创建每个“变换器”。 The factory could use reflection to interrogate all assemblies it can see that, or something like MEF which could do a lot of the work for you. 工厂可以使用反射来查询它可以看到的所有组件，或者像MEF这样可以为你做很多工作的东西。 Your driver logic should use the factory to create all the transformers and store them. 您的驱动程序逻辑应使用工厂来创建所有变换器并存储它们。

Then you need some logic and mechanism to "lookup" each XML file received to a given Transformer - perhaps each XML file has a header that you could use to identify or something similar. 然后，您需要一些逻辑和机制来“查找”接收到给定Transformer的每个XML文件 - 也许每个XML文件都有一个标题，您可以使用它来识别或类似的东西。 Again, you want to keep these decoupled from your main logic so that you can easily add new transformers without modification of the driver routine. 同样，您希望将这些与主逻辑分离，以便您可以轻松添加新变换器而无需修改驱动程序。 You could eg supply the XML file to each transformer and ask it "can you transform this file", and it is up to each transformer to "take responsibility" for a given file. 您可以例如向每个变换器提供XML文件，并询问它“您可以转换此文件”，并且每个变换器都要对给定文件“承担责任”。

Every time your driver routine gets a new XML file, it looks up the appropriate transformer, and runs it through; 每次驱动程序例程获取新的XML文件时，它都会查找相应的转换器并运行它; the result gets sent to the DB processing area. 结果将被发送到DB处理区域。 If no transformer can be found, you dump the file in a directory for interrogation later. 如果找不到转换器，则将文件转储到目录中以便稍后进行询问。

I would recommend reading a book like Agile Principles, Patterns and Practices by Robert Martin (http://www.amazon.co.uk/Agile-Principles-Patterns-Practices-C/dp/0131857258), which gives good examples of appropriate design patterns for situations like yours eg Factory and DIP etc. 我建议你阅读罗伯特·马丁的一本书，如敏捷原则，模式和实践（http://www.amazon.co.uk/Agile-Principles-Patterns-Practices-C/dp/0131857258），它提供了适当的例子。像你这样的情况设计模式，例如工厂和DIP等。

Hope that helps! 希望有所帮助！

Solution proposed by InSane is likley the most straigh forward and definitely XML friendly approach. InSane提出的解决方案可能是最直接的，也绝对是XML友好的方法。

If you looking for writing your own code to do conversion of different data formats than implementing multiple reader entities that would read data from each distinct format and transform to unified format, than your main code would work with this entities in unified way, ie by saving to the database. 如果您正在寻找编写自己的代码来进行不同数据格式的转换，而不是实现从每种不同格式读取数据并转换为统一格式的多个读取器实体，那么您的主代码将以统一的方式使用此实体，即通过保存到数据库。

Search for ETL - (Extract-Trandform-Load) to get more information - What model/pattern should I use for handling multiple data sources? 搜索ETL - （Extract-Trandform-Load）以获取更多信息 - 我应该使用哪种模型/模式来处理多个数据源？ , http://en.wikipedia.org/wiki/Extract,_transform,_load ， http：//en.wikipedia.org/wiki/Extract,_transform,_load

Using XSLT as proposed in the currently most upvoted answer, is just moving the problem, from c# to xslt. 使用当前最受欢迎的答案中提出的XSLT，只是将问题从c＃转移到xslt。

You are still changing the pieces that process the xml, and you are still exposed to how good/poor is the code structured / whether it is in c# or rules in the xslt. 您仍然在更改处理xml的部分，并且您仍然会看到代码结构的好/差程度/它是在c＃中还是在xslt中的规则中。

Regardless if you keep it in c# or go xslt for those bits, the key is to separate the transformation of the xml you receive from the various companies into a unique format, whether that's an intermediate xml or a set of classes where you load the data you are processing. 无论你将它保存在c＃还是xslt用于那些位，关键是将你从各个公司收到的xml的转换分离成一种独特的格式，无论是中间的xml还是一组你加载数据的类你正在处理。

Whatever you do avoid getting clever and trying to define your own generic transformation layer, if that's what you want Do use XSLT since that's what's for. 无论你做什么都避免变得聪明并试图定义你自己的通用转换层，如果这就是你想要的那么请使用XSLT，因为那就是它的用途。 If you go with c#, keep it simple with a transformation class for each company that implements the simplest interface. 如果您使用c＃，请为实现最简单接口的每个公司的转换类保持简单。

On the c# way, keep any reuse you may have between the transformations to composition, don't even think of inheritance to do so ... this is one of the areas where it gets very ugly quickly if you go that way. 在c＃方式中，保持在转换到组合之间可能具有的任何重用，甚至不要考虑继承这样做...这是如果你走这条路它会变得非常难看的领域之一。

你考虑过BizTalk服务器吗？

Just playing the fence here and offering another solution for other readers. 只是在这里玩篱笆，为其他读者提供另一种解决方案。

The easiest way to get the data into your models within C# is to use XSLT to convert each companies data into a serialized form of your models. 在C＃中将数据导入模型的最简单方法是使用XSLT将每个公司数据转换为模型的序列化形式。 These are the basic steps I would take: 这些是我要采取的基本步骤：

Create a complete model of all your data and use XmlSerializer to write out the model. 创建所有数据的完整模型，并使用XmlSerializer写出模型。
Create an XSLT that takes Company A's data and converts it into a valid serialized xml model of your data. 创建一个XSLT，它接收公司A的数据并将其转换为数据的有效序列化xml模型。 Use the previously created XML file as a reference. 使用以前创建的XML文件作为参考。
Use Deserialize on the new XML you just created. 对刚刚创建的新XML使用反序列化。 You will now have a reference to your model object containing all the data from the company. 现在，您将获得包含公司所有数据的模型对象的引用。