简体   繁体   中英

Generating code from data held in a SQL Database: Should I do it in a Stored Proc or on the Web Server

I have a SQL Server 2008 database which contains data that I need to use to generate code from (actually it is a SQL script I need to generate to populate another database with a different structure, but please don't get misled by that - this is a question about basically generating a big blob of text based on the data).

I am concerned about performance. Therefore, generally speaking, would it be more performant to

a) Generate the code in a stored procedure on the SQL Server:

Pro: The data doesn't have to move over the network so there are less latency issues (although the completed blob of text will have to be sent over which may be larger)

Con: Manipulating the data is cumbersome (cursors) and manipulating strings in T-SQL (I would imagine) is slower than on the web server (.NET)

b) Retrieve the data I need and generate the code on the web server:

Pro: Quicker, more flexible string handling

Con: Bringing all the data back from the SQL box

For the sake of this question lets consider using data of around 100,000 rows


UPDATE: I didn't mention that I am aiming at generating the script from a form submit and sending the results straight back to the browser. Therefore, solutions using things like SSIS may be of limited use in this scenario

Look into using SSIS (SQL Server Integration Services). SSIS allows for transformations and should be able to handle batching, etc. for large sets.

Of course, if you are needing immediate response of the operation then SSIS is not going to help that much. If the transformations are not terribly complicated and can be done in a single query, you do have the option of using CLR as has already been suggested here. I wrote a library of SQL CLR functions and procedures called SQL# (SQLsharp) which can be found at: http://www.SQLsharp.com/ and is mostly free. You can use the DB_BulkCopy Stored Procedure to do this (again, depending on the complexity of the transformations). The DB_BulkCopy procedure is available in the Free version and is based on the .Net SqlBulkCopy class (in case you wanted to write your own SQL CLR method). But this does allow you to define a query that will be used to send the result set to a destination connection (either SQL Server or Oracle). This procedure handles batched operations so transporting 100,000 rows will not be a single transaction if you do not want it to be.

From a pure experience level, SQL Server performs string manipulations MUCH slower than code.

I've re-factored several programs that take data from one source, manipulate it, and put it in another, and the first, best performance gains are achieved by moving all string manipulation into code, using DataSets and System.Text.StringBuilders.

I finally found some documentation to back this up: http://msdn.microsoft.com/en-us/library/ms131075.aspx

Additionally, managed code has a decisive performance advantage over Transact-SQL in terms of procedural code, computation, and string manipulation. CLR functions that are computing-intensive and that do not perform data access are better written in managed code.

That said, it might not hurt to try both and benchmark them and then weigh your options. In addition to performance, consider factors like readability, ease of future maintenance, etc. if the performance different isn't that great when benchmarking, other factors may become more important.

Reading your other notes on other answers, it may be that security, rather than performance should be the deciding factor. In general, it's a LOT easier to manipulate strings in code and sanitize any potentially untrusted user input to prevent SQL Injection, XSS, etc. Escaping strings is possible in pure T-SQL, but in code you can create Parameterized Queries based on the input, which is (according to OWASP) better preferred to escaping strings. That's pretty much impossible in T-SQL.

From OWASP:

This third technique is to escape user input before putting it in a query. If you are concerned that rewriting your dynamic queries as prepared statements or stored procedures might break your application or adversely affect performance, then this might be the best approach for you. However, this methodology is frail compared to using parameterized queries. This technique should only be used, with caution, to retrofit legacy code in a cost effective way. Applications built from scratch, or applications requiring low risk tolerance should be built or re-written using parameterized queries.

I prefer letter A but I will suggest you to forget about using cursors and manipulating data in SQL Server. Fetch your data from existing database and use C# code to convert it to XML then if you want to transform your data structure you could Transform the XML using XSLT which were widely known for robust data transformation.

Hope this link could help you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM