简体   繁体   English

不同模式之间的数据比较技术

[英]Techniques for data comparison between different schemas

Are there techniques for comparing the same data stored in different schemas? 是否有用于比较存储在不同模式中的相同数据的技术? The situation is something like this. 情况是这样的。 If I have a db with schema A and it stores data for a feature in say, 5 tables. 如果我有一个具有模式A的数据库,并且该数据库将某个功能的数据存储在5个表中。 Schema A -> Schema B is done during an upgrade process. 模式A->模式B在升级过程中完成。 During the upgrade process some transformation logic is applied and the data is stored in 7 tables in Schema B. What i'm after is some way to verify data integrity, basically i would have to compare different schemas while factoring in the transformation logic. 在升级过程中,将应用一些转换逻辑,并将数据存储在Schema B中的7个表中。我所需要的是某种验证数据完整性的方法,基本上我必须在比较转换逻辑的同时比较不同的模式。 Short of writing some custom t-sql sprocs to compare the data, is there an alternate method? 缺少编写一些自定义t-sql sproc来比较数据的方法,是否有替代方法? I'm leaning towards python to automate this, are there any python modules that would help me out? 我倾向于使用python来自动执行此操作,是否有任何python模块可以帮助我? To better illustrate my question the following diagram is a rough picture of one of the many data sets i would need to compare, Properties 1,2,3 and 4 are migrated from Schema source to destination, but they are spread across different tables. 为了更好地说明我的问题,下图是我需要比较的许多数据集之一的概图,属性1,2,3和4从模式源迁移到目标,但分布在不同的表中。

Table1Src                             Table1Dest
  |                                       |
  --ID(Primary Key)                       --ID(Primary Key)
  --Property1                             --Property1
  --Property2                             --Property5
  --Property3                             --Property6

Table2Src                             Table2Dest
  |                                       |
  --ID(Foreign Key->Table1Src)            --ID(Foreign Key->Table1Dest)
  --Property4                             --Property2
                                          --Property3

                                      Table3Dest
                                          |
                                          --ID(Foreign Key->Table1Dest)
                                          --Property4
                                          --Property7

Make "views" on both the schemas that translate to the same buisness representation of data. 对转换为数据的相同业务表示形式的两种模式进行“查看”。 Export these views to flat files and then you can use any plain vanilla file diff utility to compare and point out differences. 将这些视图导出到平面文件中,然后可以使用任何普通的原始文件diff实用程序进行比较并指出差异。

Basically, you should create object representations for both schema versions, and then compare objects. 基本上,您应该为两个架构版本都创建对象表示,然后比较对象。 This is best done if they all fit into memory simultaneously; 如果它们都同时放入内存,则最好这样做; if not, you need to iterate over all objects in one representation, fetch the corresponding object in the other representation, compare them, and then do the same vice versa. 如果不是,则需要遍历一个表示形式中的所有对象,获取另一表示形式中的对应对象,进行比较,然后反之亦然。

The difficult part may be to obtain object representations; 困难的部分可能是获得对象表示。 you can see whether SQLAlchemy can be used conveniently for your tables. 您可以查看是否可以方便地将SQLAlchemy用于表。 SQLAlchemy is, in principle, capable of mapping existing schema definitions onto objects. 原则上,SQLAlchemy能够将现有的架构定义映射到对象上。

I've used SQLAlchemy successfully for migration between one schema and another - that's a similar process (as indicated by Martin v. Löwis) as comparison. 我已经成功地使用SQLAlchemy在一种模式和另一种模式之间进行迁移-作为比较,这是一个类似的过程(如Martin v。Löwis所指出的)。 Especially if you use an .equals(other) method. 特别是如果您使用.equals(other)方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM