简体   繁体   English

脚本在C#,SQL 2008中下载数千个文件的最佳方法

[英]Best approach for script to download thousands of files in C#, SQL 2008

I have a script which runs through a database table and downloads a file for each row, adds to a results table in memory, then bulk uploads all the results back to the database once finished. 我有一个脚本,该脚本遍历数据库表,并为每一行下载一个文件,将其添加到内存中的结果表中,然后在完成后将所有结果批量上传回数据库中。

The problem I have is that there could be thousands of files to download and the script could timeout or error half way through. 我的问题是可能有成千上万的文件要下载,脚本可能会中途超时或出错。

Is there a better approach to this, maybe involving threading or asynchronous calls? 有没有更好的方法,可能涉及线程或异步调用?

Threading seems to be the way to go.. you should have one, or many threads that reads rows from the db(if you want many threads, you should partition the read accordingly) and putting them in some sort of concurrent collection(either .net 4 build-in ones, or built/download a custom one). 似乎应该采用线程化方法。.您应该有一个或多个线程从db读取行(如果您想要多个线程,则应该对读取的行进行分区),然后将它们放入某种并发集合中(或者是)。 net 4个内置程序,或内置/下载自定义程序)。 then you should have a thread collection who will get items from that list and get the file, if he timeouts, he should put the task back to the collection.. 那么您应该有一个线程集合,该线程集合将从该列表中获取项目并获取文件,如果他超时,他应该将任务放回集合中。

this is a basic producer-consumer threading pattern. 这是基本的生产者-消费者线程模式。 you can easly find many examples in google. 您可以轻松地在Google中找到许多示例。

If the reason for timeout is the number of files to download, I recommend to use ThreadPool for asynchronous calls. 如果超时的原因是要下载的文件数,我建议使用ThreadPool进行异步调用。

At first, set the maximum number of parallel threads using ThreadPool.SetMaxThreads . 首先,使用ThreadPool.SetMaxThreads设置最大并行线程数。 Then you can queue tasks with ThreadPool.QueueUserWorkItem . 然后,您可以使用ThreadPool.QueueUserWorkItem任务排队。 This practice will guarantee the maximum number of concurrent threads. 这种做法将保证最大并发线程数。 All above the maximum number will be waiting until one of the thread in pool is finished. 所有超过最大数量的对象将等待,直到池中的线程之一完成。

Perhaps this may be a good candidate for a cloud app. 也许这可能是云应用程序的不错选择。 Bandwidth, queueing for async processing, scalable on a timely basis? 带宽,排队进行异步处理,是否可以及时扩展?

Is it possible to persist the results back to the database after each download operation? 每次下载操作之后是否可以将结果持久化回数据库? That way you could compare the rows of the two tables in order to pick up where you left off in the case of some kind of timeout or error. 这样,您可以比较两个表的行,以在出现某种超时或错误的情况下选择从上次中断的地方开始。 Threading might make it faster, but it wont solve the problem you asked about on its own. 线程化可能使其速度更快,但是它无法单独解决您所问的问题。

如果脚本有很多时间运行(每天午夜运行一次,等等),那么最简单的解决方案是将每次X下载都保存到数据库中。

I wouldn't do this type of operation in a script. 我不会在脚本中执行此类操作。 Instead I'd have some type of program, probably running as a windows service, which would actually perform the job of downloading all of those files and updating the relevant records. 取而代之的是,我有某种类型的程序,可能是作为Windows服务运行的,实际上将执行下载所有这些文件并更新相关记录的工作。

If it is only supposed to run when a user clicks a button, then I'd have the service monitor a table for a command to execute. 如果只能在用户单击按钮时运行它,那么我将让服务监视一个表以执行命令。 Once it detects that command then kick off and go. 一旦检测到该命令,便开始并继续。

Not sure of the pattern name here, but it's basically like a job queuing system. 不确定这里的模式名称,但是基本上就像一个工作排队系统。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM