简体   繁体   English

SQL Server循环openrowset性能

[英]SQL Server loop openrowset performance

I have the following stored procedure to loop through hundreds of different JSON files that are downloaded to the server every day. 我具有以下存储过程,以循环浏览每天下载到服务器的数百个不同的JSON文件。

The issue is that the query takes a good 15 minutes to run, I will need to create something similar soon for a larger amount of JSON files, is somebody able to point me in the correct direction in regards to increasing the performance of the query? 问题是查询需要花费15分钟才能运行,我将需要为大量的JSON文件尽快创建类似的东西,有人能为我指出提高查询性能的正确方向吗?

DECLARE @json VARCHAR(MAX) = ''
DECLARE @Int INT = 1
DECLARE @Union INT = 0
DECLARE @sql NVARCHAR(max)
DECLARE @PageNo INT = 300

WHILE (@Int < @PageNo)
BEGIN
    SET @sql = (
    'SELECT 
        @cnt = value
    FROM 
        OPENROWSET (BULK ''C:\JSON\tickets' + CONVERT(varchar(10), @Int)  + '.json'', SINGLE_CLOB) as j
        CROSS APPLY OPENJSON(BulkColumn)
    WHERE
        [key] = ''tickets''
    ')
EXECUTE sp_executesql @sql, N'@cnt nvarchar(max) OUTPUT', @cnt=@json OUTPUT

IF NOT EXISTS (SELECT * FROM OPENJSON(@json) WITH ([id] int) j JOIN tickets t on t.id = j.id)
BEGIN
    INSERT INTO
        tickets (id, Field1)
    SELECT
        *
    FROM OPENJSON(@json)
         WITH ([id] int, Field1 int) 
END

END

It seems your BULK INSERT in the loop is the bottleneck. 看来循环中的BULK INSERT是瓶颈。 Generally a BULK INSERT is the fastest way to retrieve data. 通常,批量插入是检索数据的最快方法。 Anyway, here it seems the amount of files is your problem. 无论如何,这似乎是您的问题。

To make things faster you would want to read the JSON files in parallel. 为了使事情更快,您需要并行读取JSON文件。 You could do that by first creating the complete dynamic sql query for all files or maybe for some file groups and simultaneously read. 您可以通过首先为所有文件或某些文件组创建完整的动态sql查询并同时读取来做到这一点。

I would rather advise to use Integration Services with a script component as a source in parallel data flow tasks. 我宁愿建议将Integration Services与脚本组件一起用作并行数据流任务中的源。 First read all files from your destination folder, split them for example in 4 groups, for each group have a loop container that runs in parallel. 首先,从目标文件夹中读取所有文件,例如将它们分成4组,每个组都有一个并行运行的循环容器。 Depending on your executing machine, you can use as many parallel flows as possible. 根据您的执行机器,您可以使用尽可能多的并行流。 Allready 2 dataflows should make up for the overhead of integration services. Allready 2数据流应弥补集成服务的开销。

Another option would be to write a CLR (common language runtime) stored procedure and parallelly deserialize JSON using C#. 另一个选择是编写CLR(公共语言运行时)存储过程 ,并使用C#并行反序列化JSON。

It also depends on the machine doing the job. 它还取决于执行此工作的机器。 You would want to have enough random-access memory and free cpu power, so it should be considered to do the import while the machine is not busy. 您可能希望有足够的随机存取内存和空闲的cpu功能,因此应考虑在机器不忙时进行导入。

So one method I've had success with when loading data into tables from lots of individual XML files, which you might be able to apply to this problem is by using the FileTable feature of SQL server. 因此,当我将数据从许多单独的XML文件加载到表中时,我已经成功地采用了一种方法,您可能可以将其应用于此问题,方法是使用SQL Server的FileTable功能。

The way it worked was to set up a filetable in the database, then allow access to the FileStream share that was created on the server for the process that was uploading the XML files. 它的工作方式是在数据库中建立一个文件表,然后允许访问在服务器上为上传XML文件的过程创建的FileStream共享。 XML files were then dropped into the share and were immediately available in the database for querying using xPath. 然后将XML文件放入共享中,并立即在数据库中可用以使用xPath进行查询。

A process would then run xPath queries would load the required data from the XML into the required tables and keep track of which files had been loaded, then when the next schedule came along, only load data from the newest files. 然后,一个流程将运行xPath查询,该查询会将XML中的所需数据加载到所需表中,并跟踪已加载了哪些文件,然后在下一个时间表到来时,仅从最新文件中加载数据。

A scheduled task on the machine would then remove files when they were no longer required. 然后,计算机上的预定任务将在不再需要文件时将其删除。

Have a read up on FileTable here: 在这里阅读FileTable:

FileTables (SQL Server) FileTables(SQL Server)

It's available in all SQL server editions. 它在所有SQL Server版本中都可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM