简体   繁体   中英

Listing more than 10 million records from Oracle With C#

I have a database that contains more than 100 million records. I am running a query that contains more than 10 million records. This process takes too much time so i need to shorten this time. I want to save my obtained record list as a csv file. How can I do it as quickly and optimum as possible? Looking forward your suggestions. Thanks.

I'm assuming that your query is already constrained to the rows/columns you need, and makes good use of indexing.

At that scale, the only critical thing is that you don't try to load it all into memory at once; so forget about things like DataTable , and most full-fat ORMs (which typically try to associate rows with an identity-manager and/or change-manager). You would have to use either the raw IDataReader (from DbCommand.ExecuteReader ), or any API that builds a non-buffered iterator on top of that (there are several; I'm biased towards dapper). For the purposes of writing CSV, the raw data-reader is probably fine.

Beyond that: you can't make it go much faster, since you are bandwidth constrained. The only way you can get it faster is to create the CSV file at the database server , so that there is no network overhead.

Chances are pretty slim you need to do this in C#. This is the domain of bulk data loading/exporting (commonly used in Data Warehousing scenarios).

Many (free) tools (I imagine even Toad by Quest Software) will do this more robustly and more efficiently than you can write it in any platform.

I have a hunch that you don't actually need this for an end-user (the simple observation is that the department secretary doesn't actually need to mail out copies of that; it is too large to be useful in that way).

I suggest using the right tool for the job. And whatever you do,

  • donot roll your own datatype conversions
  • use CSV with quoted literals and think of escaping the double quotes inside these
  • think of regional options (IOW: always use InvariantCulture for export/import!)

"This process takes too much time so i need to shorten this time. "

This process consists of three sub-processes:

  1. Retrieving > 10m records
  2. Writing records to file
  3. Transferring records across the network (my presumption is you are working with a local client against a remote database)

Any or all of those issues could be a bottleneck. So, if you want to reduce the total elapsed time you need to figure out where the time is spent. You will probably need to instrument your C# code to get the metrics.

If it turns out the query is the problem then you will need to tune it. Indexes won't help here as you're retrieving a large chunk of the table (> 10%), so increasing the performance of a full table scan will help. For instance increasing the memory to avoid disk sorts. Parallel query could be useful (if you have Enterprise Edition and you have sufficient CPUs). Also check that the problem isn't a hardware issue (spindle contention, dodgy interconnects, etc).

Can writing to a file be the problem? Perhaps your disk is slow for some reason (eg fragmentation) or perhaps you're contending with other processes writing to the same directory.

Transferring large amounts of data across a network is obviously a potential bottleneck. Are you certain you're only sending relevenat data to the client?

An alternative architecture: use PL/SQL to write the records to a file on the dataserver, using bulk collect to retrieve manageable batches of records, and then transfer the file to where you need it at the end, via FTP, perhaps compressing it first.

The real question is why you need to read so many rows from the database (and such a large proportion of the underlying dataset). There are lots of approaches which should make this scenario avoidable, obvious ones being synchronous processing, message queueing and pre-consolidation.

Leaving that aside for now...if you're consolidating the data or sifting it, then implementing the bulk of the logic in PL/SQL saves having to haul the data across the network (even if it's just to localhost, there's still a big overhead). Again if you just want to dump it out into a flat file , implementing this in C# isn't doing you any favours.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM