简体   繁体   中英

CSV import to SQL Server

I have a CSV file (shrlgpa.csv) that has 29,471 rows. However when I import it to SQL Server, the output displays 29,482 rows copied. Why is this happening?

Here is the script I use, this script worked fine for all the other csv files:

USE master
GO
EXEC sp_configure 'show advanced options', 1
GO
RECONFIGURE WITH OVERRIDE
GO
EXEC sp_configure 'xp_cmdshell', 1
GO
RECONFIGURE WITH OVERRIDE
GO

USE [VCC_BE_SQL_stg]
GO

DECLARE @cmd varchar(8000)
= CONCAT( '"C:\Program Files\Microsoft SQL Server\110\Tools\Binn\bcp.exe"'
    , ' VCC_BE_SQL_stg.VCC.{tbl} in C:\VCCBanner\{tbl}.csv'
    , ' -a 65535 -b 10000 -c -F 2 -U mssql_stg -P password2012 -S'
    , ' WIN-8I8OQB38II4\SQLEXPRESS2012 -t 0x7C -r 0x0A -E'
    )

DECLARE @tables TABLE ( tbl varchar(255) PRIMARY KEY )
INSERT INTO @tables VALUES ('shrlgpa');

DECLARE @tbl varchar(255) = ( SELECT MIN(tbl) FROM @tables )
DECLARE @nth int = 0
DECLARE @cnt int = ( SELECT COUNT(1) FROM @tables )
DECLARE @sql varchar(255)= N'TRUNCATE TABLE VCC_BE_SQL_stg.VCC.'
DECLARE @run nvarchar(255)
WHILE ( @nth <> (@cnt) )
BEGIN
        SET @tbl = (
            SELECT tbl
            FROM @tables
            ORDER BY tbl
            OFFSET @nth ROWS
            FETCH NEXT 1 ROWS ONLY
        )
        --SET @SQL = (SELECT N'TRUNCATE TABLE VCC_BE_SQL_stg.VCC.' + @tbl)
        SET @run = ( SELECT @sql + @tbl )
        select @run
        EXEC sp_executesql @run
        SET @run = ( SELECT REPLACE( @cmd, '{tbl}', @tbl) )
        select @run
        EXEC xp_cmdshell @run
        SET @nth = @nth + 1
END

I have checked the name and the number of the fields, but i can't see something wrong there, especially that it worked perfect with other files

Here are debugging steps I would follow.

  1. I would be highly suspicious of the data file itself. Have you opened up the CSV from notepad or NotePad++ to see if there are extract lines at the end? I would first look there and place your cursor on the very last value of the last data line and hold down the shift key and highlight and delete anything after that.

    1. If that still results in more rows then again you should review the file. This is likely a result of data in the CSV with commas in the file. Is the file text qualified? Meaning is the data wrapped in double quotes.

    2. The best way moving forward is to work on how the data is being extracted. Garbage in garbage out. If you have a poorly formatted data file it will haunt you and always be a problem.

Good luck!

Where does standard error go? Where, that is, do you look for error messages from bcp.exe?

What you're doing -- using dynamic SQL to execute a command-line utility to load the database -- is perfectly awful. If the file is on the same machine as the server (or is accessible via a file server), you might want to look into BULK INSERT .

No use of bcp.exe is robust in the face of errors, which is to say it's useless for batch programming. For one thing, it does not return an error status when rows fail to load, which means the caller cannot detect any problems without parsing the standard error output. For another, rows are dropped singly or in batches, depending on the nature of the error.

What is undoubtedly happening in your case is that some rows don't conform to your hopes & expectations, and are being rejected by bcp.exe (not by the server, which never sees them). One obvious problem might be a quoted CSV field "like, this" , which bcp.exe will interpret as two fields because it has no notion of quoting. Rejected rows are reported on standard error in great detail. Your challenge is to find and detect the error report.

Once you do that, you'll come to see that CSV + BCP = fail . If you have absolute control over the CSV file, then I recommend you don't use as delimiter a character that can appear in ordinary English prose, such as the comma. Use tabs, for example, so no one thinks you have a CSV file, with all the complexity that would include.

If you don't control the format of the input file, don't use bcp.exe to load it. Parse it first as a CSV file with a tool that accounts for the hideous variety of CSV syntax, and produce a file strictly delimited by tabs or other, something that bcp.exe won't trip over. Then load that.

In case you really want to dig into it, the FreeTDS project includes a utility freebcp that does return an error status and can be used reliably in batch. Last I looked, though, you'll have to compile it first; it's distributed only as source code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM