简体   繁体   中英

Bulk importing text files / VB2005 / SQL Server 2005

I've inherited a .NET app to support / enhance which reads in a couple of files of high hundreds of thousands of rows, and one of millions of row.

The original developer left me code like:-

                For Each ModelListRow As String In ModelListDataArray
                    If ModelListRow.Trim.Length = 0 Or ModelListRow.Contains(",") = False Then
                        GoTo SKIP_ROW
                    End If

                    Dim ModelInfo = ModelListRow.Split(",")
                    Dim ModelLocation As String = UCase(ModelInfo(0))
                    Dim ModelCustomer As String = UCase(ModelInfo(1))
                    Dim ModelNumber As String = UCase(ModelInfo(2))

                    If ModelLocation = "LOCATION" Or ModelNumber = "MODEL" Then
                        GoTo SKIP_ROW
                    End If

                    Dim MyDataRow As DataRow = dsModels.Tables(0).NewRow
                    MyDataRow.Item("location") = ModelLocation.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
                    MyDataRow.Item("model") = ModelNumber.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
                    dsModels.Tables(0).Rows.Add(MyDataRow)
SKIP_ROW:
                Next

and it takes an age (well, nearly half an hour) to import these files.

I suspect there's a MUCH better way to do it. I'm looking for suggestions.

Thanks in advance.

Take a look at BULK INSERT.

http://msdn.microsoft.com/en-us/library/ms188365(v=SQL.90).aspx

Basically you point SQL Server at a text file in CSV format and it does all the logic of pulling the data into a table. If you need to massage it more than that, you can pull the text file into a staging location in SQL Server, and then run a stored proc to massage it into the format you are looking for.

The main options (apart from writing your own code from scratch) are:

  • BULK INSERT or bcp.exe, which work well if your data is cleanly formatted
  • SSIS, if you need workflow, data type transformations, data cleansing etc.
  • .NET SqlBulkCopy API

jkohlhepp's suggestion about pulling data into a staging table then cleaning it is a good one and a very common pattern in ETL processes. But if your "massaging" isn't easy to do in TSQL then you will probably need some .NET code anyway, whether it's in SSIS or in a CLR procedure.

Personally I would use SSIS in your case, because it looks like the data is not cleanly formatted so you will probably need some custom code to clean/re-format the data on its way to the database. However it does depend on what you're most comfortable/productive with and what existing tools and standards you have in place.

Dim ExcelConnection As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\MyExcelSpreadsheet.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=Yes""")
ExcelConnection.Open()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM