简体   繁体   English

如何在Java中有效地管理文件系统上的文件?

[英]How to efficiently manage files on a filesystem in Java?

I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. 我正在创建一些JAX-WS端点,为此我想保存收到和发送的消息以供以后检查。 To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. 为此,我计划将消息(XML文件)保存到文件系统中,在一些合理的层次结构中。 There will be hundreds, even thousands of files per day. 每天将有数百甚至数千个文件。 I also need to store metadata for each file. 我还需要存储每个文件的元数据。

I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read). 我正在考虑将元数据(只是几个字段)放入数据库表中,但XML文件将自身内容放入文件系统中的文件中,以免使内容数据(很少读取)膨胀数据库。

Is there some simple library that helps me in saving, loading, deleting etc. the files? 是否有一些简单的库可以帮助我保存,加载,删除等文件? It's not that tricky to implement it myself, but I wonder if there are existing solutions? 自己实现它并不是那么棘手,但我想知道是否有现有的解决方案? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems). 只是一个简单的库,已经提供了对文件系统的轻松访问(最好是通过不同的操作系统)。

Or do I even need that, should I just go with raw/custom Java? 或者我是否需要它,我应该使用原始/自定义Java?

Is there some simple library that helps me in saving, loading, deleting etc. the files? 是否有一些简单的库可以帮助我保存,加载,删除等文件? It's not that tricky to implement it myself, but I wonder if there are existing solutions? 自己实现它并不是那么棘手,但我想知道是否有现有的解决方案? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems). 只是一个简单的库,已经提供了对文件系统的轻松访问(最好是通过不同的操作系统)。

Java API Java API

Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream . 好吧,如果您需要做的事情非常简单,那么您应该能够通过java.io.File (删除,检查存在,读取,写入等)以及使用FileInputStreamFileOutputStream进行一些流操作来实现您的目标。

You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions. 您还可以使用Apache commons-io及其便捷的FileUtils来实现更多实用功能。

Java is independent of the OS. Java独立于操作系统。 You just need to make sure you use File.pathSeparator , or use the constructor File(File parent, String child) so that you don't need to explicitly mention the separator. 您只需确保使用File.pathSeparator ,或使用构造函数File(File parent, String child) ,这样就不需要明确提及分隔符。

The Java file API is relatively high-level to abstract the differences of the many OS. Java文件API相对较高,可以抽象出许多操作系统的差异。 Most of the time it's sufficient. 大部分时间都足够了。 It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, eg check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc. 只有当你需要一些不在API中的相对特定于操作系统的功能时,它才有一些缺点,例如检查磁盘上文件的物理大小(不是逻辑大小),* nix上的安全权限,可用空间/配额的硬盘驱动器等

Most OS have an internal buffer for file writing/reading. 大多数操作系统都有内部缓冲区用于文件写入/读取。 Using FileOutputStream.write and FileOutputStream.flush ensure the data have been sent to the OS, but not necessary written on the disk. 使用FileOutputStream.writeFileOutputStream.flush确保数据已发送到操作系统,但不必写入磁盘。 The Java API support also this low-level integration to manage these buffering issue (example here ) for system such as database. Java API还支持这种低级集成,以管理系统(如数据库)的这些缓冲问题(例如此处 )。

Also both file and directory are abstracted with File and you need to check with isDirectory . 文件和目录都使用File抽象,您需要使用isDirectory进行检查。 This can be confusing, for instance if you have one file x , and one directory /x (I don't remember exactly how to handle this issue, but there is a way). 这可能会令人困惑,例如,如果你有一个文件x和一个目录/x (我不记得究竟如何处理这个问题,但有一种方法)。

Web service 网络服务

The web service can use either xs:base64Binary to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large. Web服务可以使用xs:base64Binary来传递数据,或者如果文件很大则使用MTOM (消息传输优化机制)。

Transactions 交易

Note that the database is transactional and the file system not. 请注意,数据库是事务性的,而文件系统则不是。 So you might have to add a few checks if operations fails and are re-tried. 因此,如果操作失败并重新尝试,您可能需要添加一些检查。

You could go with a complicated design involving some form of distributed transaction (see this answer ), or try to go with a simpler design that provides the level of robustness that you need. 您可以使用涉及某种形式的分布式事务的复杂设计(请参阅此答案 ),或尝试使用更简单的设计来提供所需的稳健性级别。 A possible design could be: 可能的设计可能是:

  • Update . 更新 If the user wants to overwrite a file, you actually create a new one. 如果用户想要覆盖文件,则实际创建一个新文件。 The level of indirection between the logical file name and the physical file is stored in database. 逻辑文件名和物理文件之间的间接级别存储在数据库中。 This way you never overwrite a physical file once written, to ensure rollback is consistent. 这样,您一旦写入就不会覆盖物理文件,以确保回滚一致。
  • Create . 创造 Same story when user want to create a file 用户想要创建文件时也是如此
  • Delete . 删除 If the user want to delete a file, you do it only in database first. 如果用户想要删除文件,则只能在数据库中执行此操作。 A periodic job polls the file system to identify files which are not listed in database, and removes them. 定期作业轮询文件系统以识别未在数据库中列出的文件,并将其删除。 This two-phase deletes ensures that the delete operation can be rolled back. 此两阶段删除可确保可以回滚删除操作。

This is not as robust as writting BLOB in real transactional database, but provide some robustness. 这不像在实际事务数据库中写入BLOB那样健壮,但提供了一些健壮性。 You could otherwise have a look at commons-transaction , but I feel like the project is dead (2007). 你可以看看commons-transaction ,但我觉得这个项目已经死了(2007)。

There is DataNucleus , a Java persistence provider. DataNucleus是一个Java持久性提供程序。 It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.). 这种情况有点太重,但它支持具有不同数据存储(RDBMS,对象存储,XML,JSON,Excel等)的JPA和JDO java标准。 If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent. 如果产品已经在使用JPA或JDO,则可能值得考虑使用NataNucleus,因为将数据保存到不同的数据存储区应该是透明的。 I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess. 我想DataNucleus支持将数据拆分成几个文件,创建我想要的合理目录/文件结构(在我的问题中),但这只是猜测。

Support for XML and JSON seems to be experimental. 对XML和JSON的支持似乎是实验性的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM