简体   繁体   English

用大量测试数据填充数据库表

[英]Fill database tables with a large amount of test data

I need to load a table with a large amount of test data.我需要加载一个包含大量测试数据的表。 This is to be used for testing performance and scaling.这将用于测试性能和扩展。

How can I easily create 100,000 rows of random/junk data for my database table?如何为我的数据库表轻松创建 100,000 行随机/垃圾数据?

You could also use a stored procedure .您还可以使用存储过程 Consider the following table as an example:以下表为例:

CREATE TABLE your_table (id int NOT NULL PRIMARY KEY AUTO_INCREMENT, val int);

Then you could add a stored procedure like this:然后你可以添加一个这样的存储过程:

DELIMITER $$
CREATE PROCEDURE prepare_data()
BEGIN
  DECLARE i INT DEFAULT 100;

  WHILE i < 100000 DO
    INSERT INTO your_table (val) VALUES (i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;

When you call it, you'll have 100k records:当您调用它时,您将拥有 100k 条记录:

CALL prepare_data();

For multiple row cloning (data duplication) you could use对于多行克隆(数据复制),您可以使用

DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 100000 DO
    INSERT INTO `table` (`user_id`, `page_id`, `name`, `description`, `created`)
    SELECT `user_id`, `page_id`, `name`, `description`, `created`
    FROM `table`
    WHERE id = 1;
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
DROP PROCEDURE insert_test_data;

Here it's solution with pure math and sql:这是纯数学和 sql 的解决方案:

create table t1(x int primary key auto_increment);
insert into t1 () values (),(),();

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 1265 rows affected (0.01 sec)
Records: 1265  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 2530 rows affected (0.02 sec)
Records: 2530  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 5060 rows affected (0.03 sec)
Records: 5060  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 10120 rows affected (0.05 sec)
Records: 10120  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 20240 rows affected (0.12 sec)
Records: 20240  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 40480 rows affected (0.17 sec)
Records: 40480  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 80960 rows affected (0.31 sec)
Records: 80960  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 161920 rows affected (0.57 sec)
Records: 161920  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 323840 rows affected (1.13 sec)
Records: 323840  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 647680 rows affected (2.33 sec)
Records: 647680  Duplicates: 0  Warnings: 0

If you want more control over the data, try something like this (in PHP):如果您想更好地控制数据,请尝试以下操作(在 PHP 中):

<?php
$conn = mysql_connect(...);
$num = 100000;

$sql = 'INSERT INTO `table` (`col1`, `col2`, ...) VALUES ';
for ($i = 0; $i < $num; $i++) {
  mysql_query($sql . generate_test_values($i));
}
?>

where function generate_test_values would return a string formatted like "('val1', 'val2', ...)".其中函数 generate_test_values 将返回一个格式为“('val1', 'val2', ...)”的字符串。 If this takes a long time, you can batch them so you're not making so many db calls, eg:如果这需要很长时间,您可以批量处理它们,这样您就不会进行太多的数据库调用,例如:

for ($i = 0; $i < $num; $i += 10) {
  $values = array();
  for ($j = 0; $j < 10; $j++) {
    $values[] = generate_test_data($i + $j);
  }
  mysql_query($sql . join(", ", $values));
}

would only run 10000 queries, each adding 10 rows.只会运行 10000 个查询,每个查询添加 10 行。

try filldb尝试填充数据库

you can either post your schema or use existing schema and generate dummy data and export from this site and import in your data base.您可以发布您的架构或使用现有架构并生成虚拟数据并从该站点导出并导入您的数据库。

create table mydata as select * from information_schema.columns;
insert into mydata select * from mydata;
-- repeating the insert 11 times will give you at least 6 mln rows in the table.

I am terribly sorry if this is out of place, but I wanted to offer some explanation on this code as I know just enough to explain it and how the answer above is rather useful if you only understand what it does.如果这不合适,我非常抱歉,但我想对这段代码提供一些解释,因为我知道的足以解释它,以及如果你只了解它的作用,上面的答案是如何相当有用的。

The first line Creates a table called mydata , and it generates the layout of the columns from the information_schema , which stores the information about your MYSQL server, and in this case, it is pulling from information_schema.columns , which allows the table being created to have all the column information needed to create not only the table, but all the columns you will need automatically, very handy.第一行创建一个名为mydata的表,它从information_schema生成列的布局,它存储有关您的 MYSQL 服务器的信息,在这种情况下,它是从information_schema.columns提取的,这允许正在创建的表不仅拥有创建表所需的所有列信息,还拥有自动创建所需的所有列,非常方便。

The second line starts off with an Insert statement that will now target that new table called mydata and insert the Information_schema data into the table.第二行以Insert语句开始,该语句现在将以名为mydata新表为目标,并将Information_schema数据插入到该表中。 The last line is just a comment suggesting you run the script a few times if you want to generate more data.最后一行只是一条注释,建议如果您想生成更多数据,请运行脚本几次。

Lastly in conclusion, in my testing, one execution of this script generated 6,956 rows of data.最后总结一下,在我的测试中,该脚本的一次执行生成了 6,956 行数据。 If you are needing a quick way to generate some records, this isn't a bad method.如果您需要一种快速的方法来生成一些记录,这不是一个糟糕的方法。 However, for more advanced testing, you might want to ALTER the table to include a primary key that auto increments so that you have a unique index as a database without a primary key is a sad database.但是,对于更高级的测试,您可能希望ALTER表以包含自动递增的主键,以便您拥有唯一索引,因为没有主键的数据库是一个可悲的数据库。 It also tends to have unpredictable results since there can be duplicate entries.由于可能有重复的条目,它也往往会产生不可预测的结果。 All that being said, I wanted to offer some insight into this code because I found it useful, I think others might as well, if only they had spent the time to explain what it is doing.话虽如此,我想对这段代码提供一些见解,因为我发现它很有用,我认为其他人也可以,只要他们花时间解释它在做什么。 Most people aren't a fan of executing code that they have no idea what it is going to do, even from a trusted source, so hopefully someone else found this useful as I did.大多数人不喜欢执行他们不知道将要做什么的代码,即使是来自可信赖的来源,所以希望其他人像我一样发现这很有用。 I'm not offering this as "the answer" but rather as another source of information to help provide some logistical support to the above answer.我不是将其作为“答案”提供,而是作为另一种信息来源来帮助为上述答案提供一些后勤支持。

I really like the mysql_random_data_loader utility from Percona, you can find more details about it here .我真的很喜欢 Percona 的 mysql_random_data_loader 实用程序,您可以在此处找到有关它的更多详细信息。

mysql_random_data_loader is a utility that connects to the mysql database and fills the specified table with random data. mysql_random_data_loader 是一个实用程序,它连接到 mysql 数据库并用随机数据填充指定的表。 If foreign keys are present in the table, they will also be correctly filled.如果表中存在外键,它们也会被正确填充。

This utility has a cool feature, the speed of data generation can be limited.这个实用程序有一个很酷的功能,可以限制数据生成的速度。

For example, to generate 30,000 records, in the sakila.film_actor table with a speed of 500 records per second, you need the following command例如生成30000条记录,在sakila.film_actor表中以每秒500条记录的速度,需要如下命令

mysql_random_data_load sakila film_actor 30000 --host=127.0.0.1 --port=3306 --user=my_user --password=my_password --qps=500 --bulk-size=1

I have successfully used this tool to simulate a workload in a test environment by running this utility on multiple threads at different speeds for different tables.我已经成功地使用这个工具来模拟测试环境中的工作负载,方法是在多个线程上以不同的速度为不同的表运行这个实用程序。

I've created a ruby script that can insert in practically "any" database that doesn't have foreign key validations between tables, and it insert random data, so you can benchmark the database with some data in it. 我创建了一个ruby脚本,该脚本几乎可以在表之间没有外键验证的“任何”数据库中插入,并且可以插入随机数据,因此您可以在其中包含一些数据的数据库中进行基准测试。 I'll be creating a gem later (when I have some free time) from this GIST -> https://gist.github.com/carlosveucv/137ea32892ef96ab496def5fcd21858b 稍后(如果有空闲时间),我将从此GIST创建一个gem-> https://gist.github.com/carlosveucv/137ea32892ef96ab496def5fcd21858b

This is a more performant modification to @michalzuber answer.这是对@michalzuber 答案的更高效的修改。 The only difference is removing the WHERE id = 1 , so that the inserts can accumulate on each run.唯一的区别是删除WHERE id = 1 ,以便插入可以在每次运行时累积。

The amount of records produced would be n^2;产生的记录数量为 n^2;

So for 10 iterations 10^2 = 1024 records For 20 iterations 20^2 = 1048576 records and so on.因此,对于 10 次迭代,10^2 = 1024 条记录对于 20 次迭代,20^2 = 1048576 条记录,依此类推。

DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i <= 10 DO
    INSERT INTO `table` (`user_id`, `page_id`, `name`, `description`, `created`)
    SELECT `user_id`, `page_id`, `name`, `description`, `created`
    FROM `table`;
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
DROP PROCEDURE insert_test_data;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM