简体   繁体   中英

optimizing query and optimizing table

`CREATE TABLE emailAddress
(
ID int NOT NULL AUTO_INCREMENT,
EMAILID varchar(255),
LastIDfetched int,
PRIMARY KEY (ID)
)

SELECT LastIDfetched WHERE ID=1;    //say this value is x
SELECT EMAILID FROM emailAddress WHERE ID>x && ID<x+100;
UPDATE emailAddress SET LastIDfetched=x+100 WHERE ID=1;`

Basically I am trying to fetch all the email id from the database using multiple computers running in parallel, so that none of the email id is fetched by 2 computer.

What is the best way to do this? there are millions of email id. here for example i have shown that in one query 100 email id is fetched, it can vary depending on the need.

My suggestion would be to query by autoincrement ID's. You may not get an exact split of records across candidate computers if there are gaps in you autoincrement system, but this should be pretty good.

One approach is to simply look at the remainder of the autoincrement ID and grab all items of a certain value.

SELECT `EMAILID`
FROM `emailAddress`
WHERE ID % X = Y

Here X would equal the number of of computers you are using. Y would be an integer between 0 and X - 1 that would be unique to each machine running the query.

The con here is that you would not be able to use an index on this query, so if you need to do this query a lot, or on a production system taking traffic, it could be problmematic.

Another approach would be to determine the number of rows in the table and split the queries into groups

SELECT COUNT(`ID`) FROM `emailAddress`; // get row count we will call it A below

SELECT `EMAILID`
FROM `emailAddress`
WHERE ID
ORDER BY ID ASC
LIMIT (A/X) * Y, (A/X)

Here again X is number of machines, and Y is unique integer for each machine (from 0 to X -1)

The benefit here is that you are able to use index on ID. The downside is that you may miss some rows if the number of row grows between the initial query and the queries that retrieve data.

I don't understand your lastFetchedID field, but it looked like it was an unnecessary mechanitoin you were trying to use to achieve what can easily be achieved as noted above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM