简体   繁体   English

SQL Server 2016批量插入csv并从特定列生成sha1并插入到列中

[英]SQL Server 2016 bulk insert csv and generate sha1 from specific columns and insert into a column

I am fairly new to SQL Server and I am running the following command from inside my c# application: 我对SQL Server还是相当陌生,并且正在c#应用程序内部运行以下命令:

DECLARE @SQLString nvarchar(4000);
SET @SQLString = N'BULK INSERT events FROM '+ QUOTENAME(@p0) +' WITH ( BATCHSIZE = 50000, CODEPAGE = ''65001'', FIELDTERMINATOR = ''|'', ROWTERMINATOR =''\n'' )'
EXECUTE sp_executesql @SQLString

With @p0 being the path to the .csv file. @ p0是.csv文件的路径。

Right now I am generating a sha1 HashCode as a BigInteger from a combination of 3 columns in my C# code and write it into the csv file into a new column (which is the primary key). 现在,我将从C#代码中3列的组合中生成一个sha1 HashCode作为BigInteger,并将其写入csv文件中的新列(这是主键)中。

Now I saw it is possible to generate the sha1 hash inside the sql server. 现在,我看到可以在sql服务器内部生成sha1哈希。 Is this possible while bulk inserting? 批量插入时有可能吗?

eg Bulk insert csv file. 例如,批量插入csv文件。 For each row take column X,Y,Z and generate sha1 hash. 对于每一行,取列X,Y,Z并生成sha1哈希。 Convert it to BigInteger and insert it in column P? 将其转换为BigInteger并将其插入P列?

EDIT: I am trying the answer from @Nick.McDermaid: But I can't seem to get it working: 编辑:我正在尝试从@ Nick.McDermaid的答案:但我似乎无法使其工作:

CREATE TABLE [dbo].[test] (
[User] [nvarchar](185) NOT NULL,
[Stat] [nvarchar](25) NOT NULL,
[Name] [nvarchar](max) NOT NULL,
[HashByte] AS (convert(bigint, HASHBYTES('SHA1',CONVERT(nvarchar(max),[User]+[Stat]+‌​[Name])))),

CONSTRAINT [PK_dbo.test] PRIMARY KEY ([HashByte]))

I get an Error for incorrect syntax. 错误语法错误。

I suggest you take a step back here: firstly: are you saying that if there is a one character difference in your varchar(max) field (2Gb), then the record is a unique record? 我建议您退后一步:首先:您是说如果varchar(max)字段(2Gb)中存在一个字符差异,那么该记录就是唯一记录? What is the purpose of defining unique records here and what happens when a "duplicate" appears? 在这里定义唯一记录的目的是什么,当出现“重复”时会发生什么?

In this situation I recommend you follow this very commonly used staging pattern, which ends up being used for most data import processes 在这种情况下,我建议您遵循这种非常常用的暂存模式,最终将其用于大多数数据导入过程

  1. BULK INSERT into a staging table that has no PK 批量插入没有PK的登台表
  2. Use INSERT to only insert unique records into your real table 使用INSERT仅将唯一记录插入真实表中
  3. Your real table has a simple int identity PK and is guaranteed to be unique on the required columns due to step 2 您的真实表具有一个简单的int身份PK,由于步骤2,在所要求的列上它保证是唯一的
  4. You can identify records with issues in your staging table 您可以在登台表中标识有问题的记录

From my experiments, it's not possible to create any kind of unique constraint/unique index/PK on this particular calculated field 根据我的实验,无法在此特定计算字段上创建任何种类的唯一约束/唯一索引/ PK

Some sample code for step 1&2 would be: 步骤1和步骤2的一些示例代码为:

-- Bulk insert into staging table
BULK INSERT staging.events FROM.....

-- Only insert records from staging that aren't already there
INSERT INTO dbo.events (User,Stat,Name)
SELECT User,Stat,Name 
FROM staging.events S
WHERE NOT EXISTS (
   SELECT * FROM dbo.events E
   WHERE E.User = S.User
   AND E.Stat = S.Stat
   AND E.Name = S.Name
)

Now if you like you can write another update back to the staging table that identifies duplicates. 现在,如果您愿意,可以将另一个更新写回到标识重复项的登台表中。

It really comes back to the meaning of 'duplicate'. 确实回到了“重复”的含义。 If you have one character difference in your Name column, is that a duplicate? 如果您的“ Name列中有一个字符差异,那是重复的吗?

I got it working with 我明白了

CREATE TABLE dbo.test ( 
[User] nvarchar(185) NOT NULL, 
[Stat] nvarchar(25) NOT NULL, 
[Name] nvarchar(max) NOT NULL, 
[HashByte] AS CAST(HASHBYTES('SHA1', CONCAT([User],[Stat],​[Name])) AS BIGINT)
PERSISTED, CONSTRAINT [PK_dbo.test] PRIMARY KEY ([HashByte]) ) 

And then use no_dup_keys! 然后使用no_dup_keys!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM