简体   繁体   中英

Best approach for script to download thousands of files in C#, SQL 2008

I have a script which runs through a database table and downloads a file for each row, adds to a results table in memory, then bulk uploads all the results back to the database once finished.

The problem I have is that there could be thousands of files to download and the script could timeout or error half way through.

Is there a better approach to this, maybe involving threading or asynchronous calls?

Threading seems to be the way to go.. you should have one, or many threads that reads rows from the db(if you want many threads, you should partition the read accordingly) and putting them in some sort of concurrent collection(either .net 4 build-in ones, or built/download a custom one). then you should have a thread collection who will get items from that list and get the file, if he timeouts, he should put the task back to the collection..

this is a basic producer-consumer threading pattern. you can easly find many examples in google.

If the reason for timeout is the number of files to download, I recommend to use ThreadPool for asynchronous calls.

At first, set the maximum number of parallel threads using ThreadPool.SetMaxThreads . Then you can queue tasks with ThreadPool.QueueUserWorkItem . This practice will guarantee the maximum number of concurrent threads. All above the maximum number will be waiting until one of the thread in pool is finished.

Perhaps this may be a good candidate for a cloud app. Bandwidth, queueing for async processing, scalable on a timely basis?

Is it possible to persist the results back to the database after each download operation? That way you could compare the rows of the two tables in order to pick up where you left off in the case of some kind of timeout or error. Threading might make it faster, but it wont solve the problem you asked about on its own.

如果脚本有很多时间运行(每天午夜运行一次,等等),那么最简单的解决方案是将每次X下载都保存到数据库中。

I wouldn't do this type of operation in a script. Instead I'd have some type of program, probably running as a windows service, which would actually perform the job of downloading all of those files and updating the relevant records.

If it is only supposed to run when a user clicks a button, then I'd have the service monitor a table for a command to execute. Once it detects that command then kick off and go.

Not sure of the pattern name here, but it's basically like a job queuing system.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM