简体   繁体   中英

How to efficiently manage files on a filesystem in Java?

I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. There will be hundreds, even thousands of files per day. I also need to store metadata for each file.

I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read).

Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).

Or do I even need that, should I just go with raw/custom Java?

Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).

Java API

Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream .

You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions.

Java is independent of the OS. You just need to make sure you use File.pathSeparator , or use the constructor File(File parent, String child) so that you don't need to explicitly mention the separator.

The Java file API is relatively high-level to abstract the differences of the many OS. Most of the time it's sufficient. It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, eg check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.

Most OS have an internal buffer for file writing/reading. Using FileOutputStream.write and FileOutputStream.flush ensure the data have been sent to the OS, but not necessary written on the disk. The Java API support also this low-level integration to manage these buffering issue (example here ) for system such as database.

Also both file and directory are abstracted with File and you need to check with isDirectory . This can be confusing, for instance if you have one file x , and one directory /x (I don't remember exactly how to handle this issue, but there is a way).

Web service

The web service can use either xs:base64Binary to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large.

Transactions

Note that the database is transactional and the file system not. So you might have to add a few checks if operations fails and are re-tried.

You could go with a complicated design involving some form of distributed transaction (see this answer ), or try to go with a simpler design that provides the level of robustness that you need. A possible design could be:

  • Update . If the user wants to overwrite a file, you actually create a new one. The level of indirection between the logical file name and the physical file is stored in database. This way you never overwrite a physical file once written, to ensure rollback is consistent.
  • Create . Same story when user want to create a file
  • Delete . If the user want to delete a file, you do it only in database first. A periodic job polls the file system to identify files which are not listed in database, and removes them. This two-phase deletes ensures that the delete operation can be rolled back.

This is not as robust as writting BLOB in real transactional database, but provide some robustness. You could otherwise have a look at commons-transaction , but I feel like the project is dead (2007).

There is DataNucleus , a Java persistence provider. It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.). If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent. I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess.

Support for XML and JSON seems to be experimental.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM