简体繁体 English

使用应用程序BL将大量数据从同一数据库迁移到不同数据库的性能

[英]Performance in migrating a lot of data from the same DB to different DB using application BL

原文 2012-02-16 22:51:56 9 2 c#/ .net/ performance/ sql-server-2008/ data-migration

I've this problem. 我有这个问题。 I've an application that, each 15 minutes, downloads a file from the net and bulkcopies it into a DB. 我有一个应用程序，该应用程序每隔15分钟就会从网上下载一个文件并将其批量复制到数据库中。

Users of my app subscribe to this file so they signal us that they want to "copy" this file into their databases. 我的应用程序的用户订阅了此文件，因此他们向我们发出信号，希望将其“复制”到他们的数据库中。 I wrote "copy" because I should apply some business logic to this Data before putting on their database. 我之所以写“复制”，是因为在将它们放入数据库之前，我应该对该数据应用一些业务逻辑。 This business logic is customer dependent. 此业务逻辑取决于客户。

The problem is that the starting database contains something like changed 100.000 rows each 15 minutes (some are new records, some are updated and some are deleted). 问题在于，起始数据库包含每15分钟更改了100.000行的内容（有些是新记录，有些已更新，有些已删除）。

How would you tackle this problem? 您将如何解决这个问题？ I tried following the normal development process: 我尝试按照正常的开发过程进行操作：

foreach customer 争取客户
take new data --> apply business logic for that user --> put on his DB 获取新数据->为该用户应用业务逻辑->放在他的数据库中
take upd data --> apply BL --> update his DB 提取数据->应用BL->更新他的数据库
take del data --> apply BL --> remove from his DB 取del数据->应用BL->从他的数据库中删除

But it takes too much. 但这需要太多。 Far far away from 15 minutes. 距离15分钟很远。 Sometimes it takes hour for a single user. 有时单个用户要花费几个小时。

What would you do? 你会怎么做？

Thanks, Marco 谢谢，Marco

2 个解决方案

100.000 rows doesn't sound too much. 100.000行听起来并不多。

It depends on your business logic, but if it is some data transformation you can consider doing SSIS package (in terms of MS SQL Server, other RDBMS have other tools) to import your data. 这取决于您的业务逻辑，但是如果是某些数据转换，则可以考虑做SSIS包（就MS SQL Server而言，其他RDBMS具有其他工具）来导入数据。

You also can thing of taking advantage from parallelism, say, have several threads (or even virtual machines) working for you: just partition the file and process all the partitions simultaneously. 您还可以利用并行性，例如，有多个线程（甚至是虚拟机）为您工作：只需对文件进行分区并同时处理所有分区。 Even implementing a simple map/reduce algorithm may help. 即使实现简单的map / reduce算法也可能会有所帮助。

In any way, do some performance measurement, you really want to know WHY your data processing is SO slow. 无论如何，都要进行一些性能评估，您确实想知道为什么数据处理太慢了。 Probably something in your code can be optimized A LOT. 也许您的代码中的某些内容可以进行很多优化。

100.000 rows an hour is ridiculously slow, something wrong is going on there (unless you have a heavy and super complicated business logic you need to perform on each row of course) 每小时100.000行太慢了，这出了问题（除非您需要在每一行上执行繁重而复杂的业务逻辑）

It's difficult to say without seeing the code but you could try profiling your code with something like Ants Performance Profiler to try to identify where the slow down is occurring. 不看代码很难说，但是您可以尝试使用Ants Performance Profiler之类的代码对代码进行性能分析，以找出速度下降的地方。 If you don't want to use that, I believe Visual Studio 2010 contains a profiling tool. 如果您不想使用它，我相信Visual Studio 2010包含一个分析工具。

Obviously you would want to run your profiling against a debug or staging build rather than your production system. 显然，您希望对调试或分段构建而不是生产系统运行概要分析。

If you believe that it's a database speed issue, you might want to look at how you're doing your inserts and whether any indexes or triggers are affecting DB insert speeds. 如果您认为这是数据库速度问题，则可能需要查看插入的方式以及是否有任何索引或触发器正在影响数据库插入速度。