简体   繁体   中英

Getting a huge amount of data from database in the most efficient way

In my application, i have to read a huge amount of data. After i have got all of my data, i put it in a list and process on it and work accordingly.

Now i was wondering if i can do anything, Anything to speed up the getting data from the database process? My database sits on a different server and i am working with java to interact with the database.

I dont have a definite size of the data, ie a specific number of rows that i need to process. Also I hear i can go for multithreading, but then how do go about it? since i wont know how to partition my data since it is indefinite. ie if the following pseudo code is to be applied

for(i=0 to number of partition) // Not certain on the number of partitions
    create new thread and get data.

Or maybe i can hash data on the basis of some attribute and later tell each thread to fetch a particular index of the map, but then how do i map it before even fetching the data?

What all possible solutions can i look into, and how do i go about it? Let me know if you need any more info.

Thanks.

I hear i can go for multithreading, but then how do go about it?

This is definetly a good choice to speed up querying information from a remote server.
Usually in these tasks - the IO with the server is the main bottleneck, and by multithreading - one can "ask for" multiple rows concurrently - affectively reducing the IO wait times.

but then how do go about it?

The idea is to split the work into smaller tasks. Have a look at java high level concurrency API for more details.
One solution is to let each thread read a chunk of size M from the server, and repeat the process for each thread while there is still data in it (the server). Something like that (for each thread):

data = "start";
int chunk = threadNumber;
while (data != null) {
  requestChunk(chunk);
  chunk += numberOfThreads;
}

I assume here that once you are "out of bound" the server returns null (or requestChunk() processes it and returns null).

Or maybe i can hash data on the basis of some attribute and later tell each thread to fetch a particular index of the map

If you need to iterate the data, and retrieve all of it - hashing is usually a bad solution. It is very cache inefficient and the overhead is just too big for this cases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM