如何读取特定格式的内存映射文件？

Question

我正在使用Java中的“内存映射”文件。 我在内存映射文件中以Avro Binary编码格式存储了特定的用户ID数据。

存储器映射文件由两个主要部分组成：-头，它是完整文件内容的索引，专门回答文件问题以及为每个用户的数据提供文件偏移量。 -正文，后跟给定偏移量的文件中每个用户的数据。

头

version                     4 bytes
last_modified_date          8 bytes
users                       4 bytes
shards                      4 bytes
the shards                  N * 4 bytes
num_hash_index              4 bytes
num_chain_slots             4 bytes
user offset/size index      num_hash_index * num_chain_slots * (8 bytes + 8 bytes + 4 bytes)

现在，标题之后是正文，如下所示。

身体

number of records                   2 bytes         how many records does this user have?
a repeated sequence of records      variable size   as described below

所有记录均遵循此规范：

attribute key                       X bytes     a string of the users key.
key delimiter                       1 bytes     '\0'
client id                           2 bytes     some client id
last modified time (in ms)          8 bytes     This is the last modified time for this attribute in ms.
length of the avro binary data      2 bytes     actual length of avro binary data
the binary avro data or text        Y bytes     Length given by the previous field.

现在，我已经用上述格式生成了许多文件。 我需要从Java程序读取此文件。 用Java做到这一点的最佳方法是什么？ 这是我第一次使用“内存映射”文件，因此尝试了解如何进行此操作？

FileChannel fc = new RandomAccessFile(new File("c:/tmp/file.txt"), "rw").getChannel();

现在我不确定该怎么办？ 任何例子都可以帮助我更好地理解。

Answer 1

这应该做。 关键是DataInputStream中读取和转换字节的方法。 我想字节序是合适的。

 ByteBuffer buf = ByteBuffer.allocate( 9999 ); // capacity
 int nRead = fc.read( buf );
 InputStream is = new ByteArrayInputStream( buf.array() );
 DataInputStream dis = new DataInputStream( is );
 int version = dis.readInt(); //                   4 bytes
 long timestamp = dis.readLong();  //                 8 bytes
 int numUsers = dis.readInt(); //                   4 bytes

等等。

身体的更多细节

无需存储键定界符（'\\ 0'）和avro数据的长度，该长度由字节数组的长度表示。 我使用int来存储短整数，只是为了安全起见（Java中没有unsigned short），

public class UserAttribute {
  private final String attributeKey;
  private final int schemaId;               // unsigned short
  private final long lastModifiedDate;
  private final byte[] avroBinaryData;      // preceded by length: unsigned short
  // constructor and getters here

}

int numberOfAttributes = dis.readShort();
List<UserAttribute> ual = new ArrayList<>( numberOfAttributes );
for( int iAttr = 0; iAttr < numberOfAttributes; ++iAttr ){
    // read values for one attribute, create UserAttribute  object
    UserAttribute ua = new UserAttribute();
    StringBuilder sb = new StringBuilder();
    for(;;){
        int ub = dis.readUnsignedByte(); // can this be in ISO-8859-1 > 0x80?
        if( ub == 0 ) break;
        sb.append( (char)ub );
    }
    ua.setAttributeKey( sb.toString() );
    ua.setSchemaId( dis.readUnsignedShort() );
    ua.setLastModifiedDate( dis.readLong() );
    int loabd = dis.readUnsignedShort();
    byte[] abd = new byte[loabd];
    for( int ib = 0; ib < loabd; ++ib ){
        abd[ib] = dis.readByte();
    }
    ua.setAvroBinaryData();
    ual.add( ua );
}

另外，我认为阅读分片应该是

int numShards = dis.readInt(); // 4 bytes 1..101
int[] shards = new int[numShards];
for( il = 0; il < numShards; ++il ){
    shards[il] = dis.readInt(); //  N * 4 bytes     Where N is the number of shards
}

甚至更高版本的内存映射

int read = ...;
FileChannel fc = new RandomAccessFile(file, "rw").getChannel();
ByteBuffer buffer = fc.map(FileChannel.MapMode.READ_ONLY, 0, read );
buffer.order(ByteOrder.BIG_ENDIAN);

这将导致给定长度的ByteBuffer包含文件数据。 如果文件大于0x7fffffff，则必须将其映射为大块，这可以使用相同的FileChannel方法（即map）来实现。

如何读取特定格式的内存映射文件？

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-12-28 07:48:25

如何读取特定格式的内存映射文件？

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-12-28 07:48:25

解决方案1
1 已采纳 2014-12-28 07:48:25