简体   繁体   English

FTP如何不断传入文件

[英]How to FTP constantly incoming files

Ok, here's the situation... I have an application that generates about 8 files per second.好的,情况就是这样……我有一个每秒生成大约 8 个文件的应用程序。 Each file is 19-24kb.每个文件为 19-24kb。 This generates about 10 to 11 MB per minute.这会每分钟生成大约 10 到 11 MB。 This question is not about how to ftp, because I have that solution already... The question is more about how to keep up with the flow of data (only a 2mb upload bandwidth in most cases, unless I am travelling to a client site that has a large pipe).这个问题不是关于如何 ftp,因为我已经有了那个解决方案......问题更多的是关于如何跟上数据流(在大多数情况下只有 2mb 的上传带宽,除非我正在前往客户端站点有一个大管道)。 I dont care if ftp takes longer to transfer then the rate of flow, but I want to know if anyone has an idea on how to batch the files to move them so that when the ftp process is finished it will delete just those files it transfered and then move on to the next batch.我不在乎 ftp 的传输时间是否比流速更长,但我想知道是否有人知道如何批处理文件以移动它们,以便在 ftp 进程完成时,它将只删除它传输的那些文件然后继续下一批。 Here is what I was thinking:这是我的想法:

Multi thread the app, first thread runs the app, second thread is a timer that creates a text file every 'N' minutes with all the files created in that time span.应用程序多线程,第一个线程运行应用程序,第二个线程是一个计时器,它每“N”分钟创建一个文本文件,其中包含在该时间跨度内创建的所有文件。 StreamRead the file and move the files that are in text to another location (maybe create a temp folder) and then ftp those files, then delete files, folder and textfile... in the mean time, more text files are being written and temp folders being created. StreamRead文件并将文本中的文件移动到另一个位置(可能创建一个临时文件夹),然后ftp这些文件,然后删除文件,文件夹和文本文件......同时,更多的文本文件正在写入和临时正在创建的文件夹。 Does this sound feasible?这听起来可行吗? I will take any suggestions that anyone has under advisement, just looking for the fastest and most reliable path.我会接受任何人的建议,只是寻找最快和最可靠的路径。

Please dont ask to see the code, there is no reason to see it considering we are working with hypotheticals.请不要要求查看代码,考虑到我们正在处理假设,没有理由看到它。

I would create a service and add the incoming files into a concurrent collection using FileSystemWatcher, System.Threading.Timer or both (FileSystemWatcher may miss files if its buffer is overrun so it is a good idea to have a timer going to pick up any files that are missed).我会创建一个服务并使用 FileSystemWatcher、System.Threading.Timer 或两者将传入的文件添加到并发集合中(如果 FileSystemWatcher 的缓冲区溢出,它可能会丢失文件,因此最好有一个计时器来获取任何文件错过了)。 When files come in I would move them into a separate folder and would process them using .NET 4.0 tasks.当文件进入时,我会将它们移动到一个单独的文件夹中,并使用 .NET 4.0 任务处理它们。 I would then do any necessary post processing in continuation steps to the original tasks.然后,我将在原始任务的后续步骤中进行任何必要的后期处理。 You can have continuation steps that handle any faults and different continuation steps that occur upon success.您可以拥有处理任何故障的延续步骤和成功时发生的不同延续步骤。 Each of these tasks will spin up a thread in the thread pool and will be managed for you.这些任务中的每一个都将在线程池中启动一个线程并为您管理。

Here is an example from http://msdn.microsoft.com/en-us/library/dd997415.aspx of a OnlyOnFaulted continuation task.以下是来自 OnlyOnFaulted 延续任务的http://msdn.microsoft.com/en-us/library/dd997415.aspx的示例。 You could have a second continuation task that will only run when successful.您可以有第二个延续任务,只有在成功时才会运行。

var task1 = Task.Factory.StartNew(() =>
{
    throw new MyCustomException("Task1 faulted.");
})
.ContinueWith((t) =>
    {
        Console.WriteLine("I have observed a {0}",
            t.Exception.InnerException.GetType().Name);
    },
    TaskContinuationOptions.OnlyOnFaulted);

Wihtout realy knowing any more details on why you need to keep all the work in a single application and deal with threading complexity, one could argue to keep the part that generates the files and the part that FTPs the files in separate applications.如果真的不知道为什么需要将所有工作保留在单个应用程序中并处理线程复杂性的更多细节,人们可能会争辩将生成文件的部分和 FTP 文件的部分保留在单独的应用程序中。

Separation of Responsibility.责任分离。 Ensure each application does only one job and does it right and fast.确保每个应用程序只完成一项工作,并且正确且快速地完成。

One Serivce or app(desktop/web which ever) generating the files.一个服务或应用程序(桌面/网络)生成文件。

Another Service which watches a folder and moves any incoming files into a temp filder, does what it needs to do, FTPs and deletes.另一个监视文件夹并将任何传入文件移动到临时文件的服务,执行它需要做的事情,FTP 和删除。

Seeing I don't know your setup and where you get the content from for your files, writing it in a single app might be the best choice exactly how you suggested.看到我不知道您的设置以及您从哪里获取文件内容,将其写入单个应用程序可能是您所建议的最佳选择。

Basically to anwser your question.基本上回答你的问题。 Yes, it does sound feasable what you want to do.是的,你想做的事情听起来确实可行。 How you implement it and what you are happy with implementing is up to you.您如何实施它以及您对实施的满意程度取决于您。

If you get stuck somewhere during implementation, feel free to post any issues in a new threat with some code samples on how you have a specific feature implemented and what the issue is you are experiencing.如果您在实施过程中遇到问题,请随时在新威胁中发布任何问题,并附上一些代码示例,说明您如何实施特定功能以及遇到的问题。

Until then, hypothetically, any approach you feel is able to manage what you need to achieve is perfectly valid.在那之前,假设您认为能够管理您需要实现的目标的任何方法都是完全有效的。

EDIT编辑

Seeing you stated you already got the application which generates the files done and you already have a solution which FTPs means using 2 separate applications sounds more plausible.看到您说您已经完成了生成文件的应用程序,并且您已经有了一个解决方案,FTP 意味着使用 2 个单独的应用程序听起来更合理。

All you need then is wrap a service around the FTP solution and happy days.然后,您所需要的只是围绕 FTP 解决方案和快乐的日子提供服务。 No need to interfeere with the original application which generates the files if it is already working.如果它已经在工作,则无需干扰生成文件的原始应用程序。

Why risk breaking it, unless you must add the fTP feature into it and you have no choice.为什么要冒险破坏它,除非您必须在其中添加 fTP 功能并且您别无选择。

I worked on something similar in my old job.我在以前的工作中从事过类似的工作。 I'd an external process dump files on a certain folder.我将外部进程转储文件放在某个文件夹上。 This is the algorithm that I followed:这是我遵循的算法:

  1. Have a FileSystemWatcher running on the source directory where the files get dumped在转储文件的源目录上运行 FileSystemWatcher
  2. When new file is found, process ALL files from the directory in ascending order of date.找到新文件后,按日期升序处理目录中的所有文件。 (in your case ftp the file) (在你的情况下 ftp 文件)
  3. Once a file is processed, I move them to a Processed directory (in you case, you can delete them)处理文件后,我将它们移动到 Processed 目录(在您的情况下,您可以删除它们)

Things to consider:需要考虑的事项:

  1. How many open ftp connections / processing threads can I have我可以有多少个打开的 ftp 连接/处理线程
  2. FileSystemWatcher can and will raise event when processing another file. FileSystemWatcher 可以并且将在处理另一个文件时引发事件。 How to handle it / send it to an appropriate thread如何处理它/将它发送到适当的线程

You need to insert a queue between the producer of the files and the consumer (the FTP host) to be able to buffer files if the producer is too fast.如果生产者太快,您需要在文件的生产者和消费者(FTP 主机)之间插入一个队列,以便能够缓冲文件。 This requires some form of multithreading or even multiple processes.这需要某种形式的多线程甚至多个进程。

You propose a solution where the queue is the file system and that is quite possible but in many cases not ideal.您提出了一个解决方案,其中队列是文件系统,这很有可能,但在许多情况下并不理想。 You have to get locking right to avoid transferring half filled or empty files etc. If you decide to use the file system it is my experience that FileSystemWatcher can't be used for that purpose.您必须正确锁定以避免传输半满或空文件等。如果您决定使用文件系统,我的经验是FileSystemWatcher不能用于此目的。 Using a timer to run a task say every second to pick up new files is much more reliable.使用计时器运行任务,例如每秒拾取新文件更可靠。

Other queue technologies could be an in-memory queue (but then you have to think about how to handle crashes), a private Microsoft Message Queue or a SQL Server Broker queue.其他队列技术可能是内存队列(但您必须考虑如何处理崩溃)、私有 Microsoft 消息队列或 SQL 服务器代理队列。 The best solution very much depends on your requirements.最佳解决方案很大程度上取决于您的要求。

FTP is not really transactional and you may decide to use a queue that is not transactional (both MSMQ and SQL Server Broker are transactional), but you should still try to build your applications around the concept of a transaction where the file is created, queued and delivered. FTP 不是真正的事务性队列,您可能决定使用非事务性队列(MSMQ 和 SQL 服务器代理都是事务性的),但您仍然应该尝试围绕创建文件、排队的事务的概念构建应用程序并交付。 If it cannot be delivered it is left in the queue and delivery is retried later.如果无法交付,则将其留在队列中,稍后重试交付。 If it cannot be queued the producer should retry to queue it etc. You don't want a situation where a file is never delivered or is delivered twice.如果它不能被排队,生产者应该重试排队等等。你不想要一个文件永远不会被传递或被传递两次的情况。

It is not clear from your question how you are going to use FTP, but I would advise you to use an open source or commercial library to directly be able to use FTP from your application instead of shelling out to ftp.exe .从您的问题中不清楚您将如何使用 FTP,但我建议您使用开源或商业库直接能够从您的应用程序中使用 FTP,而不是使用ftp.exe This will allow your application to behave intelligently about keeping the FTP connection open to avoid excessive reconnects etc.这将允许您的应用程序智能地保持 FTP 连接打开以避免过度重新连接等。

You should also consider how to handle the situation where the queue grews too large.您还应该考虑如何处理队列增长过大的情况。 One option could be to stop the producer until the queue size has been reduced below a threshold.一种选择可能是停止生产者,直到队列大小减少到阈值以下。

  1. Start a timer that fires off once a second.启动一个每秒触发一次的计时器。
  2. In the timer's elapsed event handler, stop the timer.在计时器的经过事件处理程序中,停止计时器。
  3. Get a list of all files in the incoming directory.获取传入目录中所有文件的列表。
  4. Try to open each file exclusively.尝试以独占方式打开每个文件。 This prevents you from reading a file that is still being written to.这可以防止您读取仍在写入的文件。
  5. Copy each file to a staging directory and delete it from the incoming directory.将每个文件复制到暂存目录并将其从传入目录中删除。
  6. Once you've moved all of the files in your list, send the files in the staging directory via FTP.移动列表中的所有文件后,通过 FTP 发送暂存目录中的文件。
  7. Once you've FTP'd the files, delete them from the staging directory.将文件通过 FTP 传输后,将它们从暂存目录中删除。
  8. Start the timer.启动计时器。

The timer's elapsed handler is run for you on the thread pool, and you should need any fancier thread management.计时器的经过处理程序在线程池上为您运行,您应该需要任何更高级的线程管理。 Since you're primary constraint is your FTP bandwidth, there's little advantage to doing anything else with other threads until the files are uploaded.由于您的主要限制是您的 FTP 带宽,因此在上传文件之前与其他线程做任何其他事情几乎没有优势。

This approach gives you protection in case of a system crash.这种方法可以在系统崩溃时为您提供保护。 Files that are in the staging directory that aren't sent are picked up during the next cycle.暂存目录中未发送的文件将在下一个周期中提取。 Same goes for files in the incoming directory.传入目录中的文件也是如此。

If your FTP receiving side can handle zipped files, you'll improve your throughput by zipping the contents of the staging directory and sending it as one file.如果您的 FTP 接收端可以处理压缩文件,您将通过压缩暂存目录的内容并将其作为一个文件发送来提高吞吐量。

I would set up a chain of threads using BlockingCollections.我会使用 BlockingCollections 建立一个线程链。

One producer thread read files available, using a timer or FileSystemWatcher etc, and stores them in a BlockingCollection.一个生产者线程使用计时器或 FileSystemWatcher 等读取可用文件,并将它们存储在 BlockingCollection 中。 It also stores the files in a list to ensure they are only added once.它还将文件存储在一个列表中,以确保它们只添加一次。

var availableFiles = new BlockingCollection<string>();
var processedFiles = new BlockingCollection<string>();
var newFiles = new HashSet<string>();

...
lock (newFiles) {
    foreach (var file in Directory.GetFiles())
        if (!newFiles.Contains(file)) {
            availableFiles.Add(file);
            newFiles.Add(file);
        }
}

One, or more, ftp threads sends the files and then puts them into the processed collection一个或多个 ftp 线程发送文件,然后将它们放入已处理的集合中

foreach (var file in availableFiles.GetConsumingEnumerable()) {
   SendFileOverFtp(file);
   processedFiles.Add(file);
}

One thread that cleans up the processed files一个线程清理处理过的文件

foreach (var file in processedFiles.GetConsumingEnumerable()) {
    lock (newFiles) {
       File.Delete(file);
       newFiles.Remove(file);
    }
}

Another alternative is to have the producing thread also read the files into memory and delete them.另一种选择是让生产线程也将文件读入 memory 并删除它们。 In that case you can skip the last stage and the newFiles collection在这种情况下,您可以跳过最后一个阶段和 newFiles 集合

As an FTP server owner in this situation, I'd also ask that you find a way to stay signed on as much as possible.作为在这种情况下的 FTP 服务器所有者,我还要求您找到一种尽可能保持登录状态的方法。

Sign on/offs are often more "expensive" (in terms of computation, config blocking, etc.) than individual file transfers.登录/注销通常比单个文件传输更“昂贵”(在计算、配置阻塞等方面)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM