How to write a query that will efficiently handle large amount of records?

Question

Suppose I have a Table X that has a billion records.

Table X

ProductID AccountID ContractID

ProductID and AccountID make a composite key for Table X.

Now, in memory, I have a map (let's say Java HashMap) that contains a million (ProductID, AccountID) pairs.

I want to create a file that will contain all the (ProductID, AccountID) as well as the corresponding ContractID for that pair.

Now I can use a for loop and for each (ProductID, AccountID) I can query the table, but then I would have to do this a million times and it would be really inefficient.

The question is, how to write a query that will do this efficiently? Or can such a query be written at all? Is there another way out?

Answer 1

If speed and efficiency are of importance, then a query with a million "unions" or a million items in an IN clause is not going to be acceptable.

A more performant solution would be to perform a bulk insert of your ProductID/AccountID hashmap into a temp table, let's call it #temp. I'm not going to describe the bulk insert because that is database dependent. Then you can perform a simple join query:

SELECT ProductID, AccountID, ContractID
FROM X
INNER JOIN #temp t ON t.ProductID = X.ProductID AND t.AccountID = X.AccountID

Answer 2

Without knowing the exact SQL dialect, I'd perform an INNER JOIN :

SELECT ProductID, AccountID, ContractID
FROM X
INNER JOIN MemTable m ON m.ProductID = X.ProductID AND m.AccountID = X.AccountID

You now added Java as a tag, so am I right in thinking that the map is within your Java application? If so, it will get tough - you may actually need to query the database a million times.

On the other hand you could construct a string containing one single, large SQL statement like that:

SELECT * FROM X WHERE ProductID IN (...) AND AccountID IN (...)

where your loop just fills in a list of product IDs and account IDs comma separated. Then you issue that command once. The command should for example look like this, assuming both IDs are numeric:

SELECT * FROM X WHERE ProductID IN (1,2,3,4) AND AccountID IN (99,88,77)

EDIT
Please note that my last suggestion may have the following flaw (you'll have to decide whether this is actually a problem for you):

Assume your map contains (1, 99) and (3, 77), but in table X there are additional records (1, 77) and (3, 99). The result of my query will be (1,99), (3, 77), (1, 77) and (3, 99) as both IDs are not treated as an "entity", but individually.

So as long as there are rows that contain any combination of the given ProductID and AccountID, they will be returned.

Assuming the DB system you're using allows for this, you could expand the SELECT statement into something like this:

SELECT ProductID, AccountID, ContractID FROM X WHERE ProductID = <ValueFromMap> AND AccountID = <ValueFromMap>
UNION ALL
SELECT ProductID, AccountID, ContractID FROM X WHERE ...
UNION ALL
...

Answer 3

I guess your memory map is in your Java program? If so I think there is no efficient solution that will be database independent. Best I can think of is to try and find continous id-ranges in your memory map so that you can write SELECT FROM X where ID >= xx AND id <= yy and avoid selecting duplicate ids.

How to write a query that will efficiently handle large amount of records?

Question

3 answers

solution1
2 2013-06-12 15:29:13

solution2
1 ACCPTED 2013-06-12 15:01:31

solution3
0 2013-06-12 15:07:15

How to write a query that will efficiently handle large amount of records?

Question

3 answers

solution1 2 2013-06-12 15:29:13

solution2 1 ACCPTED 2013-06-12 15:01:31

solution3 0 2013-06-12 15:07:15

solution1
2 2013-06-12 15:29:13

solution2
1 ACCPTED 2013-06-12 15:01:31

solution3
0 2013-06-12 15:07:15