简体繁体 English

.net 3.5桌面应用程序和SQL Server 2008的性能优化

[英]Performance Optimization for .net 3.5 desktop Application and SQL Server 2008

原文 2012-01-11 14:44:52 5 4 .net/ sql/ multithreading/ performance/ xbrl

I need to improve performance a desktop application (.net) which was designed to read the database and create xml files based on XBRL (eXtensible Bussiness Reporting Language). 我需要提高桌面应用程序（.net）的性能，该应用程序旨在读取数据库并基于XBRL（可扩展业务报告语言）创建xml文件。 It is using UBMatrix for creating XBRL Taxonomies. 它使用UBMatrix创建XBRL分类法。

The application works fine if the size of particular data is small. 如果特定数据的大小很小，则该应用程序可以正常运行。 But the application will take more than 30 min to generate files if the data is big. 但是，如果数据很大，则应用程序将花费30分钟以上的时间来生成文件。 The client data is always huge/big. 客户数据总是很大/很大。 So the application requires more time to generate files. 因此，应用程序需要更多时间来生成文件。

My task is to optimise the application in order to reduce the time taken to create the xml files. 我的任务是优化应用程序，以减少创建xml文件所需的时间。 When I checked the application I found that the application is running on this way. 当我检查应用程序时，我发现该应用程序正在以这种方式运行。

Starts 开始

Create connection to db 创建与数据库的连接
gets the first set of data ( this table (table1) is too Large ). 获取第一组数据（此表（表1）太大）。 And the query will returns around 15-30 K rows to dataTable 查询将返回大约15-30 K行到dataTable
for loop 0 to datatable.Rows.count for循环0到datatable.Rows.count
- checks some condition 检查一些状况
- get data from db. 从数据库获取数据。 (this table (table2) is also too large than (table1). （此表（table2）也比（table1）大。
- send data to form xbrl and writes to xml ( this is done by thrid party application called UBMatrix). 发送数据以形成xbrl并写入xml（这是通过名为UBMatrix的第三方应用程序完成的）。 It is not possible to edit the code which creates xbrl-xml file. 无法编辑创建xbrl-xml文件的代码。

Similarly there are 3 to 4 set of data will process 同样，将处理3到4组数据

In my observation, we can avoid db calls in for loop. 在我看来，我们可以避免在for循环中进行db调用。 Get all the data before the loop. 在循环之前获取所有数据。 When I checked the queries, there were subqueries,not exists(select * from table) etc can be replaced with joins, not exists (select 1 from table) 当我检查查询时，有子查询，不存在（从表中选择*）等可以替换为联接，不存在（从表中选择1）

But still the application need to process in loop. 但是仍然需要应用程序进行循环处理。 I am also thinking of using threading so that the I can create threads based on the size of data and process it simultaneosly. 我也在考虑使用线程，以便我可以根据数据大小创建线程并同时处理它。

Eg 例如

if there are 100 rows.there will be 100 entries to xml file (XBRL) 如果有100行，那么xml文件（XBRL）会有100个条目
So i will make 50,50 and run in two threads which will generate two xml file. 所以我将使50,50并在两个线程中运行，这将生成两个xml文件。 at the end I will combine two into one xml file. 最后，我将把两者合并为一个xml文件。

So processing of 0th question and 50th question can be start at same time. 因此，第0个问题和第50个问题的处理可以同时开始。 Currently in for loop, 0th will process and 99th will be process at the end only. 当前在for循环中，第0个将处理，第99个将仅在最后处理。 I am not sure about the idea. 我不确定这个主意。 Can any suggest /share your ideas . 可以提出任何建议/分享您的想法。 any help will be appreciated. 任何帮助将不胜感激。 Thanks in advance 提前致谢

4 个解决方案

Not really an answer, just a really large comment: 并不是真正的答案，只是一个非常大的评论：

I would remove multi-threading from your plans unless the UBMatrix API states it is thread-safe, thinking of all the disc I/O when generating the XBRL. 我会从您的计划中删除多线程，除非UBMatrix API声明它是线程安全的，并在生成XBRL时考虑所有光盘I / O。

Have your profiled your app for memory usage? 您是否已针对您的应用配置了内存使用情况？ I am thinking of the 15-30K rows of data being loaded, then possible transferred into an an object model prior to processing and writing to file. 我正在考虑要加载15-30K的数据行，然后在处理和写入文件之前将其转移到对象模型中。 If you start to reach the 2GB limit (32 bit), then your process will be doing a lot of paging, which is sooo slooow. 如果您开始达到2GB的限制（32位），那么您的进程将进行大量分页，这太慢了。

Would this alternative be a possibility? 这种选择是否可能？ Pre-generate the data to file, possibly in xml format. 将数据预生成为文件，可能是xml格式。 Then, hoping the UBMatrix has an api which accepts a file path and streams data, you could just pass off the path to your file data. 然后，希望UBMatrix有一个可以接受文件路径并传输数据的api，您可以将路径传递给文件数据。 (This is more in case it is a memory issue, but could still speed things up if the data queries are long running.) （这更多地是为了解决内存问题，但是如果数据查询长时间运行，它仍然可以加快速度。）

30k queries in 30 minutes is just 16 queries per second. 30分钟内进行3万次查询仅每秒16次查询。 That is not very much unless the queries are expensive. 除非查询很昂贵，否则这不是很多。

To find out, run SQL Profiler and check the execution time of each query. 要找出答案，请运行SQL Profiler并检查每个查询的执行时间。 Multiply with the number of queries. 乘以查询数量。 If that is reasonably close to 30 minutes you are lucky if you can rewrite all those queries to a join and put the result in a Dictionary or ILookup . 如果在30分钟左右可以比较合理，那么您很幸运，可以将所有这些查询重写为ILookup并将结果放入Dictionary或ILookup 。

If you need to resort to multi threading. 如果您需要诉诸多线程。 Check if you have the possibility to upgrade to .NET 4. Then you can use Parallel.ForEach or some other suitable method in TPL to parallelize your work. 检查是否可以升级到.NET4。然后可以使用Parallel.ForEach或TPL中的其他合适方法来并行化工作。

Without seeing the code I cannot tell what classes you are using for data access but from your mention of DataTable.Rows I am assuming you are using DataSet/DataTable. 没有看到代码，我无法告诉您正在使用哪些类进行数据访问，但是从您提到的DataTable.Rows开始，我假设您正在使用DataSet / DataTable。 If you switch to using an IDataReader with CommandBehavior.SequentialAccess you can avoid a lot of the unnecessary overhead that comes with DataSet/DataTable. 如果将IDataReader与CommandBehavior.SequentialAccess一起使用，则可以避免DataSet / DataTable附带的许多不必要的开销。

I suggest profiler but for .NET app. 我建议使用Profiler，但适用于.NET应用。 Check where does it spend most of the time and attack that place. 检查大部分时间都花在哪里，然后攻击那个地方。 If it is calls to get data from DB you may look at database and possibly add some new indexes and/or redesign queries. 如果是从数据库获取数据的调用，则可以查看数据库并添加一些新索引和/或重新设计查询。 If it is in calls to UBMatrix there is probably not much you can do except get an explanation to whoever gave you this task. 如果它是在UBMatrix的调用中，则除了向向您提供此任务的人员提供解释外，您可能无能为力。 But before you give up you can try parallel processing, first making sure that UBMatrix is thread safe, as Simon pointed. 但是在放弃之前，您可以尝试并行处理，如Simon所指出的，首先要确保UBMatrix是线程安全的。 If it's not or you cannot tell you can run parallel processing as separate AppDomains to imitate thread safety. 如果不是，或者您不能告诉您可以将并行处理作为单独的AppDomain运行，以模仿线程安全性。 This will come at a cost of resources and more complex code though. 但是，这将以资源和更复杂的代码为代价。 Parallel processing will only make sense if during normal app run you can observe CPU usage below about 70% and disk is not used excessively (check with Resource Monitor) so there are spare resources to be used. 并行处理仅在正常运行应用程序时可以观察到CPU使用率低于70％且磁盘未过度使用（请与Resource Monitor一起检查），因此有可用的备用资源才有意义。

If disk is used a lot, one other way could be to check if making xml files to be written to a RAM-drive would improve anything. 如果大量使用磁盘，另一种方法可能是检查将xml文件写入RAM驱动器是否可以改善任何情况。

Anyway, start with profiling your .NET application - that should give you a good starting point. 无论如何，从配置.NET应用程序开始-应该为您提供一个良好的起点。 Here is a free .NET profiler: http://www.eqatec.com/tools/profiler/ 这是一个免费的.NET探查器： http : //www.eqatec.com/tools/profiler/