简体   繁体   English

如何在多个MySQL语句中使用限制选择唯一记录

[英]How to select unique records using limit in multiple MySQL statements

I have 10 seperate php chron jobs running that select 100 records at a time from the same table using 我有10个单独的php chron作业正在运行,该作业使用同一表一次选择100条记录

SELECT `username` FROM `data` where `id` <> = '' limit 0,100

How do I ensure that each of these recordsets are unique? 如何确保每个记录集都是唯一的? Is there a way of ensuring that each chron job does not select the same 100 records? 有没有一种方法可以确保每个同步作业不会选择相同的100条记录?

username is a unique if that helps. 如果有帮助,用户名是唯一的。

Thanks 谢谢

Jonathan 乔纳森

  • You can either choose different 100 records: 您可以选择其他100条记录:

    limit 100,100 , limit 200,100 ... limit 100,100limit 200,100 ...

  • Or choose 100 randomly: 或随机选择100:

    ...FROM data where id <> = '' ORDER BY RAND() LIMIT 0,100 ...FROM数据where id <> = '' ORDER BY RAND() LIMIT 0,100

  • If you want to ensure that a record would not be chosen twice, you'll have to mark that record ("make it dirty"), so other cron jobs would be able to query only ones that were not chosen already. 如果要确保不会选择两次记录,则必须标记该记录(“使其变脏”),因此其他cron作业将只能查询尚未选择的记录。 just add another boolean key called chosen , and mark it true after a given record was chosen. 只需添加另一个称为chosen布尔键,然后在选择给定记录后将其标记为true。 You'll have to run the cron jobs one by one, or use locking or mutex mechanism to ensure they won't run in parallel and race each other. 您必须一个一个地运行cron作业,或者使用锁定或互斥机制来确保它们不会并行运行并且相互竞争。

What you could do is 'mark' the records each job is going to use - the trick would be ensuring there's no race condition in marking them. 您可以做的是“标记”每个作业将要使用的记录-诀窍是确保标记时没有竞争条件。 Here's one way to do that. 这是一种方法。

create table job
(
    job_id int not null auto_increment,
    #add any other fields for a job you might want
    primary key(job_id)
);

# add a job_id column to data
alter table data add column job_id not null default '0', add index(job_id);

Now, when you want to get 100 data rows to work on, get a unique job_id by inserting a row into job and obtaining the automatically generated id. 现在,当您要处理100条数据行时,可以通过在行中插入一行并获取自动生成的ID来获得唯一的job_id。 Here's how you might do this in the mysql command line client, easy to see how it is adapted to code though: 这是您可以在mysql命令行客户端中执行此操作的方式,尽管很容易看出它如何适应代码:

insert into job (job_id) values(0);
set @myjob=last_insert_id();

Then, mark a hundred rows which are currently 0 然后,标记一百行当前为0

update data set job_id=@myjob where job_id=0 limit 100;

Now, you can take your time and process all rows where job_id=@myjob, safe in the knowledge no other process will touch them. 现在,您可以花时间处理所有位于job_id = @ myjob的行,这是安全的,因为没有其他进程会碰到它们。

No doubt you'll need to tailor this to suit your problem, but this illustrates how you can use simple features of MySQL to avoid a race condition among parallel processes competing for access to the same records. 毫无疑问,您将需要对此进行调整以适合您的问题,但这说明了如何使用MySQL的简单功能来避免并行进程之间争用同一记录的竞争情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM