简体   繁体   中英

how to manage files with indexes on a file system using java

I am planning to develop a server application to support and handle hign volume data migration.

Imagine this as a queue based platform where the client program (source agents that pulls metadata from a content management system) will send data packets (approxialtely 1KB size) to server and the server will store these packets in its designated file system.

The server will categorize the data packet based on some of the header information from the data packet and should be able to retrieve and return approproiate data package when it is queried using some of the header information.

We can easily perform this with standard DBMS if the metadata are properly defined but in my case the packet header information will change over a period of time and I don't want to redesign my database frequently.

The challenge that I am seeing here is to store the packet files in a file system effeciently (so that it wont affect the file server performance) and also maintain an indexing information that can be used to locate the appropriate packets when requested.

I am thinking about using any non-DBMS open source framework (java based - nosql??) that can serve the above mentioned purpose. The number of packets can range from few hunder thousands to several million based on volume of the source repository.

Appreciate your inputs.

A column-oriented database such as Apache Cassandra could handle this scenario - the indexing provided in Cassandra is relatively basic, but would probably be OK for your scenario. Several million 1KB values would be a pretty small dataset for Cassandra and should be no problem at all.

Additional metadata columns could be written alongside the main data packets; the column names can be decided on-the-fly if desired, so this would allow your header format to evolve.

The data in Cassandra is collected in in-memory tables before being written to disk in immutable "SSTables" in an efficient manner. It's also written immediately to a commitlog to provided durability in case of crashes etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM