简体   繁体   English

解析包含未知长度字段的字节数组

[英]Parsing byte array containg fields of unknown length

I am parsing in Java a byte array having the following specification: 我在Java中解析一个具有以下规范的字节数组:

Trace data format:
    - 4 bytes containing the Id.
    - 4 bytes containing the address.
    - N bytes containing the first name, where 0 < N < 32
    - N bytes containing the last name, where 0 < N < 32
    - 4 bytes containing the Minimum
    - 4 bytes containing the Maximum 
    - 4 bytes containing the Resource Default Level

Today I don't see any solution to parse this array in order to get 7 variable with the correct type. 今天我没有看到解析这个数组的任何解决方案,以获得具有正确类型的7变量。 Do you confirm or Am I missing something like a magic function in Java allowing to find String "limits" in a byte array (I can't see how the Minimum value can be distincted from its associated ASCII character). 你确认或者我错过了Java中的魔术函数,允许在字节数组中找到字符串“限制”(我无法看到最小值如何与其关联的ASCII字符区分开来)。

Is there any "convention" about a special character between the 2 strings ? 关于2个字符串之间的特殊字符是否有任何“约定”?

Well, you know that the first name starts at byte 9, and that the last name ends at byte (lenght-13). 好吧,你知道第一个名字从字节9开始,而姓氏以字节结尾(lenght-13)。 What is uncertain is how to find where the first name ends and the last name begins. 不确定的是如何找到名字的结尾和姓氏的开头。 I see a few possible soutions: 我看到一些可能的消息:

  • If the format was defined by a C programmer, the two name fields are most likely terminated by a null byte, since that's the C convention for strings. 如果格式是由C程序员定义的,则两个名称字段很可能以空字节终止,因为这是字符串的C约定。
  • If it was defined by a Java programmer, it could be written by writeUTF() , which means that the specification of the byte count is most likely wrong. 如果它是由Java程序员定义的,它可以由writeUTF()编写,这意味着字节计数的规范很可能是错误的。 However, this at least specifies the encoding, which is otherwise an open question. 但是,这至少指定了编码,否则这是一个悬而未决的问题。
  • If it was defined by a COBOL programmer, the two fields could be fixed-length and padded with zeroes or spaces, with the format specification listing the payload length rather than the field length. 如果它是由COBOL程序员定义的,则这两个字段可以是固定长度的,并用零或空格填充,格式规范列出有效载荷长度而不是字段长度。
  • If it was defined by a really incompetent programmer (whatever language), it contains the two names without delimiter or count, so it's not possible to realiably separate them (if you don't have the information, there's no "magic" function in Java or elsewhere that can conjure it out of thin air). 如果它是由一个真正无能的程序员(无论什么语言)定义的,它包含两个没有分隔符或计数的名称,所以不可能实际分开它们(如果你没有这些信息,那么Java中没有“魔法”功能或者其他可以用空气来召唤它的地方。 I suppose you could hope the last name always starts with an uppercase letter and nobody uses double names or all-caps. 我想你可能希望姓氏总是以大写字母开头,没有人使用双重名字或全部大写字母。

Is there any "convention" about a special character between the 2 strings ? 关于2个字符串之间的特殊字符是否有任何“约定”?

Well c-strings are often null-terminated \\0 . 好的c字符串通常以空值终止\\0

If there is no such character I would say that it is impossible to parse the structure. 如果没有这样的字符,我会说不可能解析结构。

Assuming the first and last name are null-terminated you would do it like this: 假设第一个和最后一个名称以null结尾,你会这样做:

int firstNameLength = 0;
while(firstNameLength<32) {
    if(theArray[firstNameLength]=='0') break;
    firstNameLength++;
}
int lastNameLength = 0;
while(lastNameLength<32) {
    if(theArray[8+firstNameLength+1+lastNameLength]=='0') break;
    i++;
}
String firstName = new String(theArray).substring(8,8+firstNameLength);
String lastName = new String(theArray).substring(8+firstNameLength+1,8+firstNameLength+1+lastNameLength);

if you want to read N ASCII bytes and turn them into a String. 如果要读取N个ASCII字节并将它们转换为字符串。

public static String readString(DataInputStream dis, int num) throws IOException {
    byte[] bytes = new byte[num];
    dis.readFully(bytes);
    return new String(bytes, 0);
}

For the rest of the values, you can use 对于其余值,您可以使用

dis.readInt();

If you are asking if there is any way to know how long the strings are, I don't believe you can determine this from the information provided. 如果您问是否有任何方法可以知道字符串有多长,我不相信您可以从提供的信息中确定这一点。 Perhaps the strings are '0' byte terminated or have the length as the first byte. 字符串可能是'0'字节终止或长度为第一个字节。 Perhaps if you look at the bytes in the file you will see what the format is. 也许如果你查看文件中的字节,你会看到格式是什么。

od -xc my-format.bin

Just to add another possibility for Michael's answer. 只是为迈克尔的答案增加另一种可能性。

Assuming that N is the same for both fields, and since the same letter is used I would guess that this is the case, the field positions would be like this: 假设两个字段的N相同,并且因为使用相同的字母,我猜这是这种情况,字段位置将是这样的:

int len = array.length;
int varLen = len - 5*4;
int fieldPos[] = new int[7];
fieldPos[0] = 0;
fieldPos[1] = 4;
fieldPos[2] = 8;
fieldPos[3] = 8 + varLen;
fieldPos[4] = 8 + 2*varLen;
fieldPos[5] = 8 + 2*varLen + 4;
fieldPos[6] = 8 + 2*varLen + 8;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM