简体   繁体   中英

Application Design: Hundreds of File Polling in Spring Integration

Colleague and I were having a discussion on the architecture of an application we are building out in Weblogic. The gist of the application is this. Files are placed on a network drive, some processing is done, files go out. The files fall into categories called transactions. The debate is, whether it is best to have all the files of the different transactions come into one folder and have one inbound file adapter looking at the folder, or separate the folders by transactions and have one inbound file adapter per transaction.

The system can have a few hundred transactions so if it's the 1:1 ratio there would be hundreds of pollers. It may also be possible to group them but we'd still have probably 50+ directories.

Not all transactions have the same throughput requirements. Some would need to be picked up in near real time - some, just look at that folder once a day and pick them up. Some transactions could have tens of thousands of files per day.

From a high level, the first component obtains the filename from a directory, moves the file to the next folder and places a message on the queue alerting the next downstream component to work on the file.

Advantage for 1 directory:

  • You only have 1 thread running.

Disadvantage:

  • You need to constantly poll very fast. You'll plateau in terms of scability as one inbound adapter can only pick up a certain amount of files in a second per directory. If using multiple JVM, the pollers will fight for who locks a certain file and can move it.

Advantage for many directories:

  • You have finer control of how the transaction is picked up. Transaction XYZ may only need to be scheduled to run once a day, while ABC every 5 minutes. XYZ won't get in the way of ABC. So if there's 10000 files of XYZ and 1 file of ABC, ABC will get picked up quickly.
  • Scalability. If I have 100 directories and I find there's not enough resources, I can for example deploy 5 of the 'file receivers' and have each looking at 20 different directories (a side note, my colleague wants to build a monolith...while I want to breakout each component into its each deployable, but theoretically if broken out I believe it's more scalable as we can increase the # of receivers)

Disadvantage: Many inbound adapters threads polling (though, not always actively).

My question to the community is - as far as Spring Integration, how terrible is it to have potentially hundreds of inbound file adapters started up in the app? What issues may arise? I assume when a file inbound adapter is not listing the directory it's pretty much idle and consumes no resources?

We are using Weblogic as the app server and my coworker also suggests using the Work Manager to manage thread resources in other parts of the system. Could that also be used to handle hundreds of inbound adapters?

Thanks!

Pollers share a single task scheduler, the default pool has 10 threads but that can be increased. So that's not really an issue - and, yes, no resources are consumed between polls.

From a high level, the first component obtains the filename from a directory, moves the file to the next folder and places a message on the queue alerting the next downstream component to work on the file.

Since the poller does so little work (move the file and send a message to a queue) I don't think it will be a limiting factor to have a single instance (perhaps with a warm standby).

my colleague wants to build a monolith...while I want to breakout each component into its each deployable

I concur with your approach. Using middleware (JMS, RabbitMQ) to distribute the work gives you most flexibility, you can increase the consumer threads in each instance and add more instances as needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM