简体   繁体   English

从平面文件中将数百万条记录插入SQL Server有哪些陷阱?

[英]What are the pitfalls of inserting millions of records into SQL Server from flat file?

I am about to start on a journey writing a windows forms application that will open a txt file that is pipe delimited and about 230 mb in size. 我即将开始编写一个Windows窗体应用程序的旅程,该应用程序将打开一个管道分隔的txt文件,大小约为230 mb。 This app will then insert this data into a sql server 2005 database (obviously this needs to happen swiftly). 然后,此应用程序将此数据插入到sql server 2005数据库中(显然这需要迅速发生)。 I am using c# 3.0 and .net 3.5 for this project. 我在这个项目中使用c#3.0和.net 3.5。

I am not asking for the app, just some communal advise here and potential pitfalls advise. 我不是要求应用程序,只是在这里提供一些公共建议和潜在的陷阱建议。 From the site I have gathered that SQL bulk copy is a prerequisite, is there anything I should think about (I think that just opening the txt file with a forms app will be a large endeavor; maybe break it into blob data?). 从我收集的网站上我已经知道SQL批量复制是一个先决条件,我应该考虑一下(我认为只需用表格应用程序打开txt文件将是一项很大的努力;可能会将其分解为blob数据?)。

Thank you, and I will edit the question for clarity if anyone needs it. 谢谢,如果有人需要,我会编辑问题以便清楚。

Do you have to write a winforms app? 你必须写一个winforms应用程序吗? It might be much easier and faster to use SSIS. 使用SSIS可能更容易,更快捷。 There are some built-in tasks available especially Bulk Insert task . 有一些内置任务可用,尤其是批量插入任务

Also, worth checking Flat File Bulk Import methods speed comparison in SQL Server 2005. 此外,值得检查平面文件批量导入方法在SQL Server 2005中的速度比较。

Update: If you are new to SSIS, check out some of these sites to get you on fast track. 更新:如果您是SSIS新手,请查看其中一些网站,以便快速了解。 1) SSIS Control Flow Basics 2) Getting Started with SQL Server Integration Services 1) SSIS控制流程基础知识 2) SQL Server Integration Services入门

This is another How to: on importing Excel file into SQL 2005 . 这是另一个如何: 将Excel文件导入SQL 2005

This is going to be a streaming endeavor. 这将是一个流媒体的努力。

If you can, do not use transactions here. 如果可以,请不要在此处使用交易。 The transactional cost will simply be too great. 交易成本太高了。

So what you're going to do is read the file a line at a time and insert it in a line at a time. 所以你要做的就是一次读取一行文件并一次插入一行。 You should dump failed inserts into another file that you can diagnose later and see where they failed. 您应该将失败的插入转储到另一个文件中,以后可以诊断并查看它们失败的位置。

At first I would go ahead and try a bulk insert of a couple of hundred rows just to see that the streaming is working properly and then you can open up all you want. 起初我会继续尝试大量插入几百行,只是为了看到流媒体正常工作,然后你可以打开所有你想要的。

You could try using SqlBulkCopy . 您可以尝试使用SqlBulkCopy It lets you pull from "any data source". 它允许您从“任何数据源”中提取。

就像旁注一样,删除表的索引并在批量插入操作之后重新创建它们有时会更快。

You might consider switching from full recovery to bulk-logged. 您可以考虑从完全恢复切换到批量记录。 This will help to keep your backups a reasonable size. 这有助于使备份保持合理的大小。

I totally recommend SSIS, you can read in millions of records and clean them up along the way in relatively little time. 我完全推荐SSIS,您可以在相对较短的时间内阅读数百万条记录并进行清理。

You will need to set aside some time to get to grips with SSIS, but it should pay off. 你需要留出一些时间来掌握SSIS,但它应该得到回报。 There are a few other threads here on SO which will probably be useful: SO上还有一些其他线程可能会有用:

What's the fastest way to bulk insert a lot of data in SQL Server (C# client) 什么是在SQL Server中批量插入大量数据的最快方法(C#客户端)

What are the recommended learning material for SSIS? SSIS的推荐学习材料是什么?

You can also create a package from C#. 您还可以使用C#创建包。 I have a C# program which reads a 3GL "master file" from a legacy system (parses into an object model using an API I have for a related project), takes a package template and modifies it to generate a package for the ETL. 我有一个C#程序从遗留系统读取3GL“主文件”(使用我为相关项目提供的API解析为对象模型),获取包模板并修改它以生成ETL的包。

If the column format of the file matches the target table where the data needs to end up, I prefer using the command line utility bcp to load the data file. 如果文件的列格式与数据需要结束的目标表匹配,我更喜欢使用命令行实用程序bcp来加载数据文件。 It's blazingly fast and you can specify and error file for any "odd" records that fail to be inserted. 它非常快,您可以为任何无法插入的“奇怪”记录指定错误文件。

Your app could kick off the command if you need to store the command line parameters for it (server, database, username / password or trusted connection, table, error file etc.). 如果您需要存储命令行参数(服务器,数据库,用户名/密码或可信连接,表,错误文件等),您的应用程序可以启动命令。

I like this method better than running a BULK INSERT SQL command because the data file isn't required to be on a system accessible by the database server. 我比运行BULK INSERT SQL命令更喜欢这种方法,因为数据文件不需要在数据库服务器可访问的系统上。 To use bulk insert you have to specify the path to the data file to load, so it must be a path visible and readable by the system user on the database server that is running the load. 要使用批量插入,必须指定要加载的数据文件的路径,因此它必须是运行负载的数据库服务器上的系统用户可见且可读的路径。 Too much hassle for me usually. 通常对我来说太麻烦了。 :-) :-)

The size of data you're talking about actually isn't that gigantic. 您所谈论的数据大小实际上并不是那么巨大。 I don't know what your efficiency concerns are, but if you can wait a few hours for it to insert, you might be surprised at how easy this would be to accomplish with a really naive technique of just INSERTing each row one at a time. 我不知道您的效率问题是什么,但是如果您可以等待几个小时来插入它,您可能会惊讶于使用一种非常天真的技术来实现它是多么容易,只需一次插入一行。 Batching together a thousand or so rows at a time and submitting them to SQL server may make it quite a bit faster as well. 一次批量处理一千个行并将它们提交给SQL服务器也可以使它快得多。

Just a suggestion that could save you some serious programming time, if you don't need it to be as fast as conceivable. 只是一个可以节省一些严肃的编程时间的建议,如果你不需要它可以想象得那么快。 Depending on how often this import has to run, saving a few days of programming time could easily be worth it in exchange for waiting a few hours while it runs. 根据导入运行的频率,节省几天的编程时间可能很值得,以换取在运行时等待几个小时。

You could use SSIS for the read & insert, but call it as a package from your WinForms app. 您可以使用SSIS进行读取和插入,但可以从WinForms应用程序中将其称为包。 Then you could pass in things like source, destination, connection strings etc as parameter/configurations. 然后你可以传递源,目标,连接字符串等内容作为参数/配置。

HowTo: http://msdn.microsoft.com/en-us/library/aa337077.aspx HowTo: http//msdn.microsoft.com/en-us/library/aa337077.aspx

You can set up transforms and error handling inside SSIS and even create logical branching based on input parameters. 您可以在SSIS中设置转换和错误处理,甚至可以根据输入参数创建逻辑分支。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将csv文件中的数百万条记录插入SQL Server数据库的正确方法是什么? - What is the proper way to insert millions of records from a csv file into a SQL Server database? 从固定宽度的平面文件到SQL 2000中获取数百万条记录 - Get millions of records from fixed-width flat file to SQL 2000 LINQ to SQL超时插入数百万条记录 - Timeout in LINQ to SQL inserting millions of records 如何有效地将数百万条记录从.NET流传输到SQL Server - How to efficiently stream millions of records from .NET to SQL Server SQL Server从平面文件插入 - SQL Server insert from flat file 从SQL读取数百万条记录,处理它们并将它们插入另一台SQL服务器 - Read millions of records from SQL, process them and insert them to another SQL server 从数据库SQLite C#服务堆栈读取SQL数据(数百万条记录)的最快方法是什么 - What is the fastest way to read the SQL Data (Millions of records) from database SQLite C# Service Stack 将新记录插入SQL Server - Inserting new records into SQL Server 在SQL Server数据库中插入数百万行的最佳方法 - Best approach for inserting millions of rows into a SQL Server database 将丢失的记录从Oracle DB插入SQL Server数据库 - Inserting missing records from Oracle DB into SQL Server database
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM