简体繁体 English

使用Filesystem Watcher和/或MSMQ wcf服务进行多线程

[英]Multithreading with Filesystem watcher and/or MSMQ wcf service

原文 2012-02-16 13:06:52 6 2 c#/ multithreading

I need to create a service which is basically responsible for the following: 我需要创建一个基本上负责以下工作的服务：

Watch a specific folder for any new files created. 在特定文件夹中查看创建的任何新文件。
If yes , read that file , process it and save data in DB. 如果是，请读取该文件，进行处理并将数据保存在DB中。

For the above task, I am thinking of creating a multi threaded service with either of the following approach: 对于上述任务，我正在考虑使用以下两种方法之一创建多线程服务：

In the main thread, create an instance of filesystem watcher and as soon as a new file is created, add that file in the threadQueue. 在主线程中，创建文件系统监视程序的实例，并在创建新文件后立即将该文件添加到threadQueue中。 There will be N no. 不会有N。 of consumer threads running which should take a file from the queue and process it (ie step 2). 正在运行的使用者线程数，这些使用者线程应从队列中获取文件并进行处理（即步骤2）。
Again in the main thread, create an instance of filesystem watcher and as soon as a new file is created, read that file and add the data to MSMQ using wcf MSMQ service. 再次在主线程中，创建文件系统监视程序的实例，并在创建新文件后立即读取该文件，并使用wcf MSMQ服务将数据添加到MSMQ。 When the message is read by the wcf msmq service, it will be responsible for processing further 当wcf msmq服务读取该消息时，它将负责进一步处理

I am a newbie when it comes to creating a multi threaded service. 关于创建多线程服务，我是新手。 So not sure which will tbe the best option. 因此，不确定哪个将是最佳选择。 Please guide me. 请指导我。

Thanks, 谢谢，

2 个解决方案

First off, let me say that you have taken a wise approach to do a single producer - multiple consumer model. 首先，我要说的是，您采取了明智的方法来建立单个生产者-多个消费者模型。 This is the best approach in this case. 在这种情况下，这是最好的方法。

I would go for option 1, using a ConcurrentQueue data structure, which provides you an easy way to queue tasks in a thread-safe manner. 我会选择使用ConcurrentQueue数据结构的选项1，它为您提供了一种以线程安全的方式将任务排队的简便方法。 Alternatively, you can simply use the ThreadPool.QueueUserWorkItem method to send work directly to the built-in thread pool, without worrying about managing the workers or the queue explicitly. 另外，您可以简单地使用ThreadPool.QueueUserWorkItem方法将工作直接发送到内置线程池，而不必担心显式管理工作程序或队列。

Edit : Regarding the reliability of FileSystemWatcher , MSDN says: 编辑：关于FileSystemWatcher的可靠性， MSDN说：

The Windows operating system notifies your component of file changes in a buffer created by the FileSystemWatcher. Windows操作系统在FileSystemWatcher创建的缓冲区中将文件更改通知组件。 If there are many changes in a short time, the buffer can overflow. 如果在短时间内有很多更改，缓冲区可能会溢出。 This causes the component to lose track of changes in the directory, and it will only provide blanket notification. 这将导致该组件失去对目录更改的跟踪，它将仅提供一揽子通知。 Increasing the size of the buffer with the InternalBufferSize property is expensive, as it comes from non-paged memory that cannot be swapped out to disk, so keep the buffer as small yet large enough to not miss any file change events. 使用InternalBufferSize属性增加缓冲区的大小是昂贵的，因为它来自无法调换到磁盘的非分页内存，因此，将缓冲区保持小而又足够大，以免丢失任何文件更改事件。 To avoid a buffer overflow, use the NotifyFilter and IncludeSubdirectories properties so you can filter out unwanted change notifications. 为避免缓冲区溢出，请使用NotifyFilter和IncludeSubdirectories属性，以便可以过滤掉不需要的更改通知。

So it depends on how often changes will occur and how much buffer you are allocating. 因此，这取决于更改发生的频率和分配的缓冲区数量。

I would also consider your demands for failure handling and sizes of the files you are sending. 我还将考虑您对故障处理的要求以及所发送文件的大小。 Whether you decide for option 1 or 2 will be dependent on specifications. 是否选择选项1或2将取决于规格。

Option 2 has the avantage that by using MSMQ you have your data persisted in a recoverable way, even if you may need to restart your machine. 选项2的优势在于，即使您可能需要重新启动计算机，使用MSMQ仍可以以可恢复的方式保留数据。 Option 1 only has your data in memory which might get lost. 选项1仅将您的数据存储在内存中，这可能会丢失。

On the other hand, option 2 has a disadvantage that the message size of MSMQ is limited to 4 MB per message (explanation in a Microsoft blog here ) and therefore only half of it when working with unicode characters, while the in-memory queues are capaple of much bigger sizes. 在另一方面，选项2的缺点是MSMQ的邮件大小限制为4 MB每封邮件（在微软博客解释在这里），并与Unicode字符工作时，因此，只有它的一半，而在内存中的队列更大的电容器。

[Edit] [编辑]

Thinking a bit longer, I would prefer option 2 . 再想一想，我宁愿选择2 。 In your comment, you mention that you want to move files around in the filesystem. 在评论中，您提到要在文件系统中移动文件。 This can be very expensive in regards to performance, even worse if you move the files between different partions. 就性能而言，这可能会非常昂贵，如果在不同分区之间移动文件，则更糟。

I have used the MSQM in multiple projects at work and am convinced that it would work well for what you want to do. 我已经在工作中的多个项目中使用了MSQM，并且确信它可以很好地满足您的需求。 A big advantage here would be that the MSMQ works with transactional communications. 这里的一大优势是MSMQ可与事务通信一起使用。 That means, that if for some reason a network or electricity or whatever failure occurs, neither your message nor your files get lost. 这意味着，如果由于某种原因发生网络，电力或任何故障，则您的消息和文件都不会丢失。

If any of those happen while you move a file around it could easily get corrupted. 如果在您移动文件时发生任何此类情况，很容易损坏它。

Only thing I have grumbles in my stomach is the file sizes. 我肚子里唯一发牢骚的是文件大小。 To work around the message size limitations of 4 MB (see added link above), I would not put the file content into a message. 要解决4 MB的消息大小限制（请参见上面的添加链接），我不会将文件内容放入消息中。 Instead. 代替。 I would only send an ID or a filepath with it so that the consuming service can find it and read it when needed. 我只会发送一个ID或文件路径，以便消费服务可以找到它并在需要时读取它。

This keeps the message and queue sizes small and avoids using too much bandwith or memory in network and on your serve(s). 这样可以使消息和队列的大小变小，并避免在网络和服务中使用过多的带宽或内存。