简体   繁体   English

在mysql中使用伪数据自动填充表

[英]automatically populate table with dummy data in mysql

I have a MySQL table which I want to populate with some dummy data for testing (50+). 我有一个MySQL表,我想在其中填充一些虚拟数据进行测试(超过50个)。

This table has a foreign key to another table so the dummy data must cross reference from that table but again be random ie can't all be the same foreign key. 该表具有到另一个表的外键,因此虚拟数据必须与该表进行交叉引用,但又是随机的,即不能全部都是相同的外键。

It also has a date added field which I want to populate with a random date within a year span eg any date in the year 2010. 它也有一个添加日期字段,我想用一年跨度内的一个随机日期填充该日期,例如2010年中的任何日期。

my table structure is: 我的表结构是:

id, customer_id, date_added, title, total_cost

where id is the primary key, customer_id is the foreign key and date_added is the date field. 其中id是主键,customer_id是外键,date_added是日期字段。

What is the best way of doing this? 最好的方法是什么? I'd prefer to do it directly in MySQL but if not my site is running on Python so a way of doing this in that would do. 我宁愿直接在MySQL中执行此操作,但如果我的站点不是在Python上运行,则可以这样做。

I would not do this in MySQL without outside help from an application written in Python. 没有Python编写的应用程序的外部帮助,我不会在MySQL中这样做。

There are several requirements built into your statement that are best expressed in a procedural style. 语句中内置了几个要求,这些要求最好以过程样式表示。 SQL is a set-based language; SQL是一种基于集合的语言。 I don't think it lends itself as nicely to the task at hand. 我认为它不能很好地完成当前的任务。

You'll want an application to take in data from a source, do whatever randomization and PII removal that you need, and then construct the test data according to your requirements. 您将希望应用程序从源中获取数据,进行所需的任何随机化和PII移除,然后根据您的要求构造测试数据。

If it's database intended just for test, you might consider an in-memory database that you can populate, modify all you like, and then blow away for your next test. 如果该数据库仅用于测试,则可以考虑使用一个内存数据库,可以对其进行填充,修改所有内容,然后将其删除以进行下一个测试。 I'm thinking about something like Hypersonic or Derby or TimesTen. 我在考虑Hypersonic,Derby或TimesTen之类的东西。

quick and dirty solution: 快速而肮脏的解决方案:

drop table if exists orders;
drop table if exists customers;

create table customers
(
cust_id int unsigned not null auto_increment primary key,
name varchar(255) not null
)
engine=innodb;

create table orders
(
order_id int unsigned not null auto_increment primary key,
cust_id int unsigned not null,
order_date datetime not null,
foreign key (cust_id) references customers(cust_id) on delete cascade
)
engine=innodb;


drop procedure if exists load_test_data;

delimiter #

create procedure load_test_data()
begin

declare v_max_customers int unsigned default 0;
declare v_max_orders int unsigned default 0 ;
declare v_counter int unsigned default 0 ;
declare v_rnd_cust_id int unsigned default 0;
declare v_base_date datetime;

  set foreign_key_checks = 0;

  truncate table orders;
  truncate table customers;

  set foreign_key_checks = 1;

  set v_base_date = "2010-01-01 00:00:00";

  set v_max_customers = 1000;
  set v_max_orders = 10000; 

  start transaction;

  set v_counter = 0;
  while v_counter < v_max_customers do
        insert into customers (name) values (concat('Customer ', v_counter+1));
    set v_counter=v_counter+1;
  end while;

  commit;

  start transaction;

  set v_counter = 0;
  while v_counter < v_max_orders do

    set v_rnd_cust_id = floor(1 + (rand() * v_max_customers));

        insert into orders (cust_id, order_date) values (v_rnd_cust_id, v_base_date + interval v_counter hour);
    set v_counter=v_counter+1;
  end while;

  commit;

end #

delimiter ;

call load_test_data();

select * from customers order by cust_id desc limit 10;
select * from orders order by order_id desc limit 10;

For testing business rules, I actually prefer carefully thought out data over random data. 为了测试业务规则,我实际上更喜欢仔细考虑数据而不是随机数据。 Either from excel->csv->db or manually created insert statements. 从excel-> csv-> db或手动创建的插入语句。

One row for each boundary condition, say: 每个边界条件一行,说:

  • Customer without orders 没有订单的客户
  • One Customer with zero total cost 一位客户,总成本为零
  • One customer with foreign characters in the name (because I always forget to deal with it) 一位客户的名字中带有外国字符(因为我总是忘记处理)
  • One customer with max length name 一位客户,名字最大长度
  • One Customer with shit loads of orders (to make sure that the GUI still looks nice) 一位有很多订单的客户(以确保GUI看起来仍然不错)

It makes it really easy to run regression tests because you "know" what the data should look like. 这使运行回归测试非常容易,因为您可以“知道”数据的外观。

For performance testing, you can do pretty good with random data as long as the data distribution is realistic (which affects the usefulness of indexes). 对于性能测试,只要数据分布是现实的(这会影响索引的有效性),就可以对随机数据做得很好。 If you have very advanced requirements, your best bet is to use some software built for this purpose. 如果您有非常高级的要求,最好的选择是使用一些为此目的而构建的软件。

But often you can generate all the data you need from one single table of integers and clever use of built-in functions: 但是通常您可以从一张整数表中生成所需的所有数据,并巧妙地使用内置函数:

  • rand() -> Generate random number. rand() ->生成随机数。
  • mod() -> Used to create repeating sequences (1,2,3,1,2,3) mod() ->用于创建重复序列(1,2,3,1,2,3)
  • lpad() and rpad() -> For padding strings to specified lengths lpad() and rpad() ->用于将字符串填充到指定的长度

As such this question is old and answered but I assume you still need to know this one stored procedure to load dummy data to MySQL which runs from MySQL and auto-populates dummy data according to datatypes. 因此,这个问题已经过时了,但是我想您仍然需要知道这个存储过程,才能将虚拟数据加载到MySQL ,该数据从MySQL运行,并根据数据类型自动填充虚拟数据。

All you need to specify database-name, table-name and number of records to be populate. 您只需指定数据库名称,表名称和要填充的记录数即可。

call populate('sakila','film',1000,'N');

(You might want to follow on the Git-Repo for updates as well.) (您可能还想在Git-Repo上进行更新。)

If you really want to get down with some setting up of testing data, you should go the fixture route. 如果您真的想了解一些测试数据的设置,则应该走夹具路线。 This will help set yourself up a pretty nice development environment and may integrate very nicely into your website's framework if you're using one. 这将帮助您建立一个非常好的开发环境,并且如果您正在使用它,则可以很好地集成到您的网站框架中。

You can find a link to the documentation of the fixture module here 您可以在此处找到夹具模块文档的链接。

If you think that's a little too much work to get all working, look into the MySQLdb module which will help you insert data into your table. 如果您认为要完成所有工作需要太多工作,请查看MySQLdb模块,该模块将帮助您将数据插入表中。

It may be in poor taste to link back to a stackoverflow, but someone has already answered the date question you are asking. 链接回stackoverflow可能不太好,但是有人已经回答了您要询问的日期问题。 You can find that here . 你可以在这里找到。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM