简体   繁体   English

将记录插入数据库的更快方法

[英]Faster way to insert records into database

So I currently have a database table of about 70,000 names. 因此,我目前有大约70,000个名称的数据库表。 What I want to do is take 3000 random records from that database and insert them into another table where each name has a row for all the other names. 我要做的是从该数据库中提取3000条随机记录,并将它们插入到另一个表中,每个表中的所有其他名称都有一行。 In other words, the new table should look like this: 换句话说,新表应如下所示:

John, jerry
john, alex
john, sam
jerry, alex
jerry, sam
alex, sam

This means that I should be adding summation n rows to the table. 这意味着我应该将总计n行添加到表中。 My current strategy is to use two nested for loops to add these rows one at a time and then removing the first name from the list of names to add in order to ensure I dont have a duplicate record with different ordering. 我当前的策略是使用两个嵌套的for循环一次添加这些行,然后从要添加的名称列表中删除名字,以确保我没有重复顺序的记录。

My question is this: is there a faster way to do this, perhaps through parallel for loops or PLINQ or some other option that I a have not mentioned? 我的问题是:是否可以通过并行的for循环或PLINQ或我未提及的其他选项来实现此目的?

You will need to figure out the random part 您将需要找出随机部分

select t1.name, t2.name 
from table t1 
join table t2 
on t1.name < t2.name 
order by t1.name, t2.name

You need to materialize the newid 您需要实现newid

declare @t table (name varchar(10) primary key);
insert into @t (name) values 
       ('Adam')
     , ('Bob')
     , ('Charlie')
     , ('Den')
     , ('Eric')
     , ('Fred');
declare @top table (name varchar(10) primary key);
insert into @top (name)
select top (4) name from @t order by NEWID();

select * from @top;

select a.name, b.name
from @top a  
join @top b 
  on a.name < b.name  
order by a.name, b.name;

Given a table "Names" with an nvarchar(50) column "Name" with this data: 给定一个表“ Names”,其数据包含nvarchar(50)列“ Name”:

Adam
Bob
Charlie
Den
Eric
Fred

This query: 该查询:

-- Work out the fraction we need
DECLARE @frac AS float;
SELECT @frac = CAST(35000 AS float) / 70000;

-- Get roughly that sample size
WITH ts AS (
SELECT Name FROM Names
WHERE @frac >= CAST(CHECKSUM(NEWID(), Name) & 0x7FFFFFFF AS float) / CAST (0X7FFFFFFF AS int)
)

-- Match each entry in the sample with all the other entries
SELECT x.Name + ', ' + y.Name
FROM ts AS X
CROSS JOIN
Names AS Y
WHERE x.Name <> y.Name

produces results of the form 产生表格的结果

Adam, Bob
Adam, Charlie
Adam, Den
Adam, Eric
Adam, Fred
Charlie, Adam
Charlie, Bob
Charlie, Den
Charlie, Eric
Charlie, Fred
Den, Adam
Den, Bob
Den, Charlie
Den, Eric
Den, Fred

The results will vary by run; 结果因运行而异; a sample of 3000 out of 70000 will have approximately 3000 * 70000 result rows. 70000个样本中的3000个样本将具有 3000 * 70000个结果行。 I used 35000./70000 because the sample size I used was only 6. 我使用35000./70000,因为我使用的样本大小仅为6。

If you want only the names from the sample used, change CROSS JOIN Names AS Y to CROSS JOIN ts AS Y , and there will then be approximately 3000 * 3000 result rows. 如果只希望使用样本中的名称,请将CROSS JOIN Names AS Y更改为CROSS JOIN ts AS Y ,那么结果行将大约为3000 * 3000。

Reference: The random sample method was taken from the section "Important" in Limiting Result Sets by Using TABLESAMPLE . 参考:随机样本方法来自使用TABLESAMPLE限制结果集中的“重要”部分。

Using a Number table to simulate names. 使用数字表模拟名称。

single query, using a triangular join 单个查询,使用三角连接

WITH all_names 
     AS (SELECT n, 
                'NAME_' + Cast(n AS VARCHAR(20)) NAME 
         FROM   number 
         WHERE  n < 70000), 
     rand_names 
     AS (SELECT TOP 3000 * 
         FROM   all_names 
         ORDER  BY Newid()), 
     ordered_names 
     AS (SELECT Row_number() 
                  OVER ( 
                    ORDER BY NAME) rw_num, 
                NAME 
         FROM   rand_names) 
SELECT n1.NAME, 
       n2.NAME 
FROM   ordered_names n1 
       INNER JOIN ordered_names n2 
               ON n2.rw_num > n1.rw_num   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM