简体   繁体   English

使用 unsafe 从 golang 中的二进制数据中提取字符串的最佳方法

[英]Best way to extract strings from binary data in golang using unsafe

I have an application which loads a byte array of several gigabytes.我有一个加载几个千兆字节的字节数组的应用程序。 I dont have control of the binary format.我无法控制二进制格式。 The program spends most of its time converting sections of the array into strings, doing string manipulation and then releasing all of the strings.该程序花费大部分时间将数组的各个部分转换为字符串,进行字符串操作,然后释放所有字符串。 It occasionally runs out of memory when there are large numbers of clients triggering large numbers of objects being allocated in memory.当有大量客户端触发在 memory 中分配的大量对象时,它偶尔会用完 memory。

Given that the byte array lives in memory for the entire life of he Application it seems like an ideal candidate for using the unsafe package to avoid memory allocation.鉴于字节数组在应用程序的整个生命周期中都存在于 memory 中,它似乎是使用不安全的 package 来避免 memory 分配的理想候选者。

Just testing this out in the go playground, it appears a "SliceHeader" is needed to generate an actual string.只需在 go 操场上进行测试,似乎需要一个“SliceHeader”来生成一个实际的字符串。 But this means a "SliceHeader" must still be allocated every time a string needs to be returned.但这意味着每次需要返回字符串时仍必须分配“SliceHeader”。 (ie the "x" variable in this example) (即本例中的“x”变量)

func main() {
    t := []byte{
        65, 66, 67, 68, 69, 70,
        71, 72, 73, 74, 75, 76,
        77, 78, 79, 80, 81, 82,
        83, 84, 85,
    }
    var x [10]reflect.StringHeader

    h := (*reflect.StringHeader)(unsafe.Pointer(&x[0]))
    h.Len = 4
    h.Data = uintptr(unsafe.Pointer(&t[8]))

    fmt.Printf("test %v\n", *(*string)(unsafe.Pointer(&x[0])))

    h = (*reflect.StringHeader)(unsafe.Pointer(&x[1]))
    h.Len = 4
    h.Data = uintptr(unsafe.Pointer(&t[3]))

    fmt.Printf("test %v\n", *(*string)(unsafe.Pointer(&x[1])))
}

I could probably attach an array with a fixed length set of string header objects to each client when they connect to the server (that is re-cycled when new clients connect).我可能会在每个客户端连接到服务器时将具有固定长度字符串 header 对象的数组附加到每个客户端(新客户端连接时重新循环)。

This means that 1. string data would no longer be copied around, and 2. string headers are not being allocated/garbage collected.这意味着 1. 字符串数据将不再被复制,并且 2. 字符串标头不会被分配/垃圾收集。 3. We know the maximum number of clients per server because they have a fixed/hardcoded amount of stringheaders available when they are pulling out strings. 3. 我们知道每台服务器的最大客户端数量,因为它们在提取字符串时具有固定/硬编码数量的可用字符串头。

Am I on track, crazy?我走上正轨了吗,疯了吗? Let me know Thanks.让我知道谢谢。

Use the following function to convert a byte slice to a string without allocation:使用以下 function 将字节切片转换为字符串而不进行分配:

func btos(p []byte) string {
    return *(*string)(unsafe.Pointer(&p))
}

The function takes advantage of the fact that the memory layout for a string header is a prefix of the memory layout for a slice header. The function takes advantage of the fact that the memory layout for a string header is a prefix of the memory layout for a slice header.

Do not modify the backing array of the slice after calling this function -- that will break the assumption that strings are immutable.在调用此 function 后不要修改切片的后备数组——这将打破字符串不可变的假设。

Use the function like this:像这样使用 function:

t := []byte{
    65, 66, 67, 68, 69, 70,
    71, 72, 73, 74, 75, 76,
    77, 78, 79, 80, 81, 82,
    83, 84, 85,
}
s := btos(t[8:12])
fmt.Printf("test %v\n", s) // prints test IJKL

s = btos(t[3:7])
fmt.Printf("test %v\n", s) // prints test DEFG

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM