简体   繁体   English

从字节数组中获取具有不同字节的所有字节模式?

[英]Get all occurrences of a byte pattern with varying bytes from a byte array?

How can I match a byte array to a larger byte array and get the unique data and the location in the byte array that the pattern ends? 如何将字节数组与较大的字节数组匹配,并获取模式结束的字节数组中的唯一数据和位置?


FF FF FF FF XX XX XX XX FF FF FF FF (any length of any bytes goes here) 2E XX XX XX 00 FF FF FF FF XX XX XX XX FF FF FF FF (任何字节长度均为此处)2E XX XX XX 00


I have the above pattern (where XX is any byte) and i need to get bolded parts plus the location in the array of the last byte, How can I do this? 我有上面的模式(其中XX是任何字节),我需要获得粗体部分加上最后一个字节数组中的位置,我该怎么做? (note: I need to get all occurrences of this pattern) (注意:我需要得到所有这种模式)

I cannot convert it to a string as it has null bytes (0x00) and they are often part of the first four XX XX XX XX bytes. 我无法将其转换为字符串,因为它具有空字节(0x00),并且它们通常是前四个XX XX XX XX字节的一部分。

I've been trying to figure this out for a while now and would appreciate it if you guys could help me out! 我一直试图解决这个问题一段时间,如果你们能帮助我,我会很感激! Thanks. 谢谢。

Edit: the above bytes are in hex 编辑:上面的字节是十六进制

Who says you can't convert it to a string? 谁说你不能将它转换为字符串?

byte[] bytes = new byte[]
{
    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x31, 0x32, 0x33, 0x34, 0xff, 0x2a, 0x00
};
var s = Encoding.Default.GetString(bytes);
Console.WriteLine(bytes.Length);
Console.WriteLine(s.Length);
foreach (var c in s)
{
    Console.Write("0x{0:X2}, ", (int)c);
}
Console.WriteLine();

Both the array and the string are shown with a length of 13. And the bytes output from the string are the same as the bytes in the array. 数组和字符串都显示为长度为13.字符串输出的字节数与数组中的字节数相同。

You can convert it to a string. 可以将其转换为字符串。 Then you can use regular expressions to find what you're looking for. 然后,您可以使用正则表达式来查找您要查找的内容。

Note that Encoding.Default might not be what you're looking for. 请注意, Encoding.Default可能不是您正在寻找的。 You want an 8-byte encoding that doesn't modify any of the characters. 您需要一个不会修改任何字符的8字节编码。

But if you want an algorithmic way to do it, there are a couple of ways that spring to mind. 但是如果你想要一种算法方法,那么有几种方法可以让人想到。 First way (and probably easiest) is to scan forward looking for 2E followed by three bytes, and then a 00 . 第一种方式(可能最简单)是向前扫描,寻找2E然后是3个字节,然后是00 Then start at the beginning again and see if you find FF FF FF FF XX XX XX XX FF FF FF FF . 然后再次从头开始,看看你是否找到FF FF FF FF XX XX XX XX FF FF FF FF That's not the fastest way to do things, but it's pretty easy. 这不是最快的做事方式,但这很容易。

Note that if you search backwards from the 2E , you could end up "finding" a shorter string. 请注意,如果从2E向后搜索,最终可能会“找到”更短的字符串。 That is, if your input was: 也就是说,如果您输入的是:

FF FF FF FF XX XX XX XX FF FF FF FF 01 02 FF FF FF FF XX XX XX XX FF FF FF FF 0A 0B 2E XX XX XX 00

There are two occurrences of the starting pattern. 起始模式有两次出现。 If you searched backwards from the 2E , you'd match the second one, which probably isn't what you want. 如果你从2E向后搜索,你将匹配第二个,这可能不是你想要的。

The other way is to build yourself a little state machine that searches forward. 另一种方法是建立一个向前搜索的小型状态机。 That'll be faster, but a bit more difficult. 那会更快,但有点困难。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM