简体   繁体   English

CSV导入到SQL Server

[英]CSV import to SQL Server

I have a CSV file (shrlgpa.csv) that has 29,471 rows. 我有一个具有29,471行的CSV文件(shrlgpa.csv)。 However when I import it to SQL Server, the output displays 29,482 rows copied. 但是,当我将其导入到SQL Server时,输出显示复制了29,482行。 Why is this happening? 为什么会这样呢?

Here is the script I use, this script worked fine for all the other csv files: 这是我使用的脚本,该脚本对于所有其他csv文件都可以正常工作:

USE master
GO
EXEC sp_configure 'show advanced options', 1
GO
RECONFIGURE WITH OVERRIDE
GO
EXEC sp_configure 'xp_cmdshell', 1
GO
RECONFIGURE WITH OVERRIDE
GO

USE [VCC_BE_SQL_stg]
GO

DECLARE @cmd varchar(8000)
= CONCAT( '"C:\Program Files\Microsoft SQL Server\110\Tools\Binn\bcp.exe"'
    , ' VCC_BE_SQL_stg.VCC.{tbl} in C:\VCCBanner\{tbl}.csv'
    , ' -a 65535 -b 10000 -c -F 2 -U mssql_stg -P password2012 -S'
    , ' WIN-8I8OQB38II4\SQLEXPRESS2012 -t 0x7C -r 0x0A -E'
    )

DECLARE @tables TABLE ( tbl varchar(255) PRIMARY KEY )
INSERT INTO @tables VALUES ('shrlgpa');

DECLARE @tbl varchar(255) = ( SELECT MIN(tbl) FROM @tables )
DECLARE @nth int = 0
DECLARE @cnt int = ( SELECT COUNT(1) FROM @tables )
DECLARE @sql varchar(255)= N'TRUNCATE TABLE VCC_BE_SQL_stg.VCC.'
DECLARE @run nvarchar(255)
WHILE ( @nth <> (@cnt) )
BEGIN
        SET @tbl = (
            SELECT tbl
            FROM @tables
            ORDER BY tbl
            OFFSET @nth ROWS
            FETCH NEXT 1 ROWS ONLY
        )
        --SET @SQL = (SELECT N'TRUNCATE TABLE VCC_BE_SQL_stg.VCC.' + @tbl)
        SET @run = ( SELECT @sql + @tbl )
        select @run
        EXEC sp_executesql @run
        SET @run = ( SELECT REPLACE( @cmd, '{tbl}', @tbl) )
        select @run
        EXEC xp_cmdshell @run
        SET @nth = @nth + 1
END

I have checked the name and the number of the fields, but i can't see something wrong there, especially that it worked perfect with other files 我已经检查了字段的名称和数量,但是我看不到那里的问题,特别是它可以与其他文件完美配合

Here are debugging steps I would follow. 这是我将遵循的调试步骤。

  1. I would be highly suspicious of the data file itself. 我会高度怀疑数据文件本身。 Have you opened up the CSV from notepad or NotePad++ to see if there are extract lines at the end? 您是否从记事本或NotePad ++中打开了CSV文件,以查看末尾是否有提取行? I would first look there and place your cursor on the very last value of the last data line and hold down the shift key and highlight and delete anything after that. 我首先要看一下,然后将光标放在最后一条数据行的最后一个值上,然后按住Shift键,然后高亮显示并删除任何内容。

    1. If that still results in more rows then again you should review the file. 如果仍然导致更多行,则应再次检查该文件。 This is likely a result of data in the CSV with commas in the file. 这可能是由于CSV数据中文件中带有逗号引起的。 Is the file text qualified? 文件文字是否合格? Meaning is the data wrapped in double quotes. 含义是用双引号引起来的数据。

    2. The best way moving forward is to work on how the data is being extracted. 前进的最佳方法是研究如何提取数据。 Garbage in garbage out. 垃圾进垃圾出。 If you have a poorly formatted data file it will haunt you and always be a problem. 如果您的数据文件格式不正确,它将困扰您,并且始终是一个问题。

Good luck! 祝好运!

Where does standard error go? 标准误差在哪里? Where, that is, do you look for error messages from bcp.exe? 在哪里,即从bcp.exe查找错误消息?

What you're doing -- using dynamic SQL to execute a command-line utility to load the database -- is perfectly awful. 您正在执行的操作-使用动态SQL执行命令行实用程序以加载数据库-简直太糟糕了。 If the file is on the same machine as the server (or is accessible via a file server), you might want to look into BULK INSERT . 如果文件与服务器位于同一台机器上(或可通过文件服务器访问),则可能需要调查BULK INSERT

No use of bcp.exe is robust in the face of errors, which is to say it's useless for batch programming. 面对错误,没有使用bcp.exe会很可靠,也就是说,它对于批处理编程没有用。 For one thing, it does not return an error status when rows fail to load, which means the caller cannot detect any problems without parsing the standard error output. 一方面,当行加载失败时,它不会返回错误状态,这意味着调用者在不解析标准错误输出的情况下无法检测到任何问题。 For another, rows are dropped singly or in batches, depending on the nature of the error. 另外,根据错误的性质,行将被单独删除或成批删除。

What is undoubtedly happening in your case is that some rows don't conform to your hopes & expectations, and are being rejected by bcp.exe (not by the server, which never sees them). 毫无疑问,在您的情况下,发生的情况是某些行不符合您的希望和期望,并且被bcp.exe(而不是服务器,它从未看到过)拒绝。 One obvious problem might be a quoted CSV field "like, this" , which bcp.exe will interpret as two fields because it has no notion of quoting. 一个明显的问题可能是带引号的CSV字段"like, this" ,因为bcp.exe没有引号的概念,它将被解释为两个字段。 Rejected rows are reported on standard error in great detail. 被拒绝的行会详细报告标准错误。 Your challenge is to find and detect the error report. 您面临的挑战是查找并检测错误报告。

Once you do that, you'll come to see that CSV + BCP = fail . 完成此操作后,您将看到CSV + BCP = fail If you have absolute control over the CSV file, then I recommend you don't use as delimiter a character that can appear in ordinary English prose, such as the comma. 如果您完全控制CSV文件,那么我建议您不要使用普通英语散文(例如逗号)中出现的字符作为分隔符。 Use tabs, for example, so no one thinks you have a CSV file, with all the complexity that would include. 例如,使用标签,因此没有人认为您拥有一个CSV文件,其中包含所有复杂性。

If you don't control the format of the input file, don't use bcp.exe to load it. 如果您不控制输入文件的格式,请不要使用bcp.exe加载它。 Parse it first as a CSV file with a tool that accounts for the hideous variety of CSV syntax, and produce a file strictly delimited by tabs or other, something that bcp.exe won't trip over. 首先使用一种工具处理CSV文件,将其解释为CSV语法的各种变化,然后生成一个严格由制表符或其他符号分隔的文件,而bcp.exe不会出错。 Then load that. 然后加载。

In case you really want to dig into it, the FreeTDS project includes a utility freebcp that does return an error status and can be used reliably in batch. 如果你真的想钻研吧, freetds的项目包括一个实用程序freebcp ,它返回一个错误状态,可以批量可靠地使用。 Last I looked, though, you'll have to compile it first; 最后我看了一下,但是,您必须先对其进行编译。 it's distributed only as source code. 它仅作为源代码分发。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM