[英]How to efficiently manage files on a filesystem in Java?
I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. 我正在创建一些JAX-WS端点,为此我想保存收到和发送的消息以供以后检查。 To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy.
为此,我计划将消息(XML文件)保存到文件系统中,在一些合理的层次结构中。 There will be hundreds, even thousands of files per day.
每天将有数百甚至数千个文件。 I also need to store metadata for each file.
我还需要存储每个文件的元数据。
I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read). 我正在考虑将元数据(只是几个字段)放入数据库表中,但XML文件将自身内容放入文件系统中的文件中,以免使内容数据(很少读取)膨胀数据库。
Is there some simple library that helps me in saving, loading, deleting etc. the files? 是否有一些简单的库可以帮助我保存,加载,删除等文件? It's not that tricky to implement it myself, but I wonder if there are existing solutions?
自己实现它并不是那么棘手,但我想知道是否有现有的解决方案? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).
只是一个简单的库,已经提供了对文件系统的轻松访问(最好是通过不同的操作系统)。
Or do I even need that, should I just go with raw/custom Java? 或者我是否需要它,我应该使用原始/自定义Java?
Is there some simple library that helps me in saving, loading, deleting etc. the files?
是否有一些简单的库可以帮助我保存,加载,删除等文件? It's not that tricky to implement it myself, but I wonder if there are existing solutions?
自己实现它并不是那么棘手,但我想知道是否有现有的解决方案? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).
只是一个简单的库,已经提供了对文件系统的轻松访问(最好是通过不同的操作系统)。
Java API Java API
Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream . 好吧,如果您需要做的事情非常简单,那么您应该能够通过java.io.File (删除,检查存在,读取,写入等)以及使用FileInputStream和FileOutputStream进行一些流操作来实现您的目标。
You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions. 您还可以使用Apache commons-io及其便捷的FileUtils来实现更多实用功能。
Java is independent of the OS. Java独立于操作系统。 You just need to make sure you use
File.pathSeparator
, or use the constructor File(File parent, String child)
so that you don't need to explicitly mention the separator. 您只需确保使用
File.pathSeparator
,或使用构造函数File(File parent, String child)
,这样就不需要明确提及分隔符。
The Java file API is relatively high-level to abstract the differences of the many OS. Java文件API相对较高,可以抽象出许多操作系统的差异。 Most of the time it's sufficient.
大部分时间都足够了。 It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, eg check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.
只有当你需要一些不在API中的相对特定于操作系统的功能时,它才有一些缺点,例如检查磁盘上文件的物理大小(不是逻辑大小),* nix上的安全权限,可用空间/配额的硬盘驱动器等
Most OS have an internal buffer for file writing/reading. 大多数操作系统都有内部缓冲区用于文件写入/读取。 Using
FileOutputStream.write
and FileOutputStream.flush
ensure the data have been sent to the OS, but not necessary written on the disk. 使用
FileOutputStream.write
和FileOutputStream.flush
确保数据已发送到操作系统,但不必写入磁盘。 The Java API support also this low-level integration to manage these buffering issue (example here ) for system such as database. Java API还支持这种低级集成,以管理系统(如数据库)的这些缓冲问题(例如此处 )。
Also both file and directory are abstracted with File
and you need to check with isDirectory
. 文件和目录都使用
File
抽象,您需要使用isDirectory
进行检查。 This can be confusing, for instance if you have one file x
, and one directory /x
(I don't remember exactly how to handle this issue, but there is a way). 这可能会令人困惑,例如,如果你有一个文件
x
和一个目录/x
(我不记得究竟如何处理这个问题,但有一种方法)。
Web service 网络服务
The web service can use either xs:base64Binary
to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large. Web服务可以使用
xs:base64Binary
来传递数据,或者如果文件很大则使用MTOM (消息传输优化机制)。
Transactions 交易
Note that the database is transactional and the file system not. 请注意,数据库是事务性的,而文件系统则不是。 So you might have to add a few checks if operations fails and are re-tried.
因此,如果操作失败并重新尝试,您可能需要添加一些检查。
You could go with a complicated design involving some form of distributed transaction (see this answer ), or try to go with a simpler design that provides the level of robustness that you need. 您可以使用涉及某种形式的分布式事务的复杂设计(请参阅此答案 ),或尝试使用更简单的设计来提供所需的稳健性级别。 A possible design could be:
可能的设计可能是:
This is not as robust as writting BLOB in real transactional database, but provide some robustness. 这不像在实际事务数据库中写入BLOB那样健壮,但提供了一些健壮性。 You could otherwise have a look at commons-transaction , but I feel like the project is dead (2007).
你可以看看commons-transaction ,但我觉得这个项目已经死了(2007)。
There is DataNucleus , a Java persistence provider. DataNucleus是一个Java持久性提供程序。 It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.).
这种情况有点太重,但它支持具有不同数据存储(RDBMS,对象存储,XML,JSON,Excel等)的JPA和JDO java标准。 If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent.
如果产品已经在使用JPA或JDO,则可能值得考虑使用NataNucleus,因为将数据保存到不同的数据存储区应该是透明的。 I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess.
我想DataNucleus支持将数据拆分成几个文件,创建我想要的合理目录/文件结构(在我的问题中),但这只是猜测。
Support for XML and JSON seems to be experimental. 对XML和JSON的支持似乎是实验性的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.