简体   繁体   中英

best practice for directory polling

I have to do batch processing to automate business process. I have to poll directory at regular interval to detect new files and do processing. While old files is being processed, new files can come in. For now, I use quartz scheduler and thread synchronization to ensure that only one thread can process files.

Part of the code are:

application-context.xml

<bean id="methodInvokingJob"
  class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"><br/>
  <property name="targetObject" ref="documentProcessor" /><br/>
  <property name="targetMethod" value="processDocuments" /><br/>
</bean>

DocumentProcessor
.....

public void processDocuments() { 
  LOG.info(Thread.currentThread().getName() + " attempt to run.");
  if (!processing) {
     synchronized (this) {
        try {
           processing = true;
           LOG.info(Thread.currentThread().getName() + " is processing");
           List<String> xmlDocuments = documentManager.getFileNamesFromFolder(incomingFolderPath);               
           // loop over the files and processed unlock files.
           for (String xmlDocument : xmlDocuments) {
              processDocument(xmlDocument);
           }
        }
        finally {
           processing = false;
        }
     }
  }
}

For the current code, I have to prevent other thread to process files when one thread is processing. Is that a good idea ? or we support multi-threaded processing. In that case how can I know which files is being process and which files has just arrived ? Any idea is really appreciated.

I would build it with these parts:

  1. Castle Transactions with TxF
  2. FileSystemWatcher JavaVersion
  3. TransactionScope (no java version unless you hack it a lot)
  4. A lock-free queue * (Paper discussing perf Java vs .Net, might be able to get source from them for Java ) Java lock-based queues

    Such that:

When there's a new file, the file system watcher detects it (remember to put the correct flags, handle the error condition and set Enbled <- True and watch out for doubles), puts the file path in the queue.

You have an application thread, n worker threads. If this is the only app, they spin-wait on the queue, TryDequeue, otherwise they block on a monitor while(!Monitor.Enter(has_items)) ;

When a worker threads get a path through the de-queue operation, it starts working on it, and now no other thread can work on it. If there are doubles of output (depending on your setup), you can then use a file transaction as you are writing the output file. If the Commit operation fails, then you know another thread has already written the output file, and resume polling the queue.

I'd do the following:

  • One thread that gets your filenames and adds them to a synchronized queue.

  • Multiple threads to do the actual reading: get an item from the synced queue and process it.

To check if a file is used you can simply try to rename/move it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM