简体   繁体   English

比较来自不同数据库的两行的列

[英]Compare the Columns of Two Rows from Different Databases

I'm trying to validate that a data transfer process is working correctly. 我正在尝试验证数据传输过程是否正常运行。 I have source tables in one database and destination tables in a different database. 我在一个数据库中有源表,而在另一个数据库中有目标表。 I want to validate the data transfer for a specific key value. 我想验证特定键值的数据传输。

Ultimately I want results back in the following type of format: 最终,我希望结果返回以下类型的格式:

Table     Column     Matches
----------------------------
Company   Name       Y
Company   Address    Y
Company   Phone      N

I need to keep the list of columns dynamic so that this code doesn't change if a column is added to one of the tables. 我需要保持列的列表动态,以便在将列添加到表之一时此代码不会更改。 The list of tables is known. 表列表是已知的。

Right now I'm using cursors. 现在我正在使用游标。 One to loop through a list of the tables that I need to compare and another for each table that loops through a query returning the list of columns. 一个循环浏览我需要比较的表的列表,另一个循环浏览返回返回列列表的查询的每个表。 It works, but I'm concerned about the performance. 它有效,但我担心性能。 I'm examining 22 tables and might need to compare multiple records in a table for the specified key value. 我正在检查22个表,可能需要比较表中的多个记录以获得指定的键值。 So, comparing all of the records for one key right now instantiates 30-40 cursors. 因此,现在比较一个键的所有记录会实例化30-40个游标。

I feel like there should be a better solution but haven't been able to find anything that does the job while remaining as dynamic as possible. 我觉得应该有一个更好的解决方案,但在保持尽可能多的动态的同时,找不到能够完成工作的任何东西。

Does anyone have any ideas for me to try? 有人有什么想法让我尝试吗? Thanks in advance! 提前致谢!

It turns out that the fastest way for me to do this was using C#. 事实证明,执行此操作最快的方法是使用C#。 I just pulled all of the records to be analyzed into data tables in C# and analyzed them from memory. 我只是将所有要分析的记录拉入C#的数据表中,然后从内存中分析它们。 It reduced my total time to under 2s for each key value analyzed. 对于每个已分析的键值,我的总时间减少到2秒以下。 And that includes parsing the output to XML in the way that I want. 这包括按照我想要的方式将输出解析为XML。 At this point I'm feeling like that's about as fast as I can expect it to be. 在这一点上,我感觉这就像我所期望的那样快。 Thanks to everyone for their suggestions. 感谢大家的建议。

SQL only: You need to extract the id and column from each database's table and create exclusion sets and merge them. 仅限于SQL:您需要从每个数据库的表中提取ID和列,并创建排除集并合并它们。 Then you can use count() to figure out if there are any that are mismatched. 然后,您可以使用count()找出是否存在不匹配的内容。

I haven't done SQL in ages but maybe this will be enough pseudocode to start you up: 我很久没有做过SQL了,但也许这足以使您入门:

SELECT count(1) from (
    (SELECT id, columnname FROM tablename EXCEPT SELECT id, columnname FROM othertable) 
        UNION
    (SELECT id, columnname FROM othertable EXCEPT SELECT id, columnname FROM tablename)
)

You could also do inner join and compare missing records twice which would be a bit faster but more complex. 您还可以进行内部联接,并两次比较丢失的记录,这会更快但更复杂。

Iterations in SQL Server are very poor and you should avoid them whenever possible. SQL Server中的迭代非常差,应尽可能避免使用。 I'd rather go "relational". 我宁愿去“关系”。 For example you can compare the table with a FULL OUTER JOIN and check if any of the keys are NULL . 例如,您可以将表与FULL OUTER JOIN进行比较,并检查是否有任何键为NULL Here, obviously, if you add a column you have to change your query slightly. 显然,在这里,如果添加一列,则必须稍微更改查询。

If you want to use cursors I'd suggest you make them readonly and fast forward and see if you gain something. 如果要使用游标,建议您将它们设置为只读并快进,看看是否有收获。 Like this: 像这样:

DECLARE C CURSOR FAST_FORWARD FOR
...
...
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM