简体   繁体   English

什么是非null终止的字符串?

[英]what are non null terminated string?

The "sz" part of the prefix is important, because some strings in the Windows world (especially when talking about the DDK) are not zero-terminated. 前缀的“ sz”部分很重要,因为Windows世界中的某些字符串(尤其是在谈论DDK时)不是零结尾的。 reading this in STR,LPSTR section 在STR,LPSTR部分阅读

can anyone tell me what are those non null terminated string? 谁能告诉我那些非null终止的字符串是什么?

In computer science, a string is a sequence of characters . 在计算机科学中, 字符串字符序列 A sequence has some length—there are some number of characters in it. 一个序列有一定的长度,其中有一定数量的字符。 To work with a string, one generally has to know the length of the string. 要使用字符串,通常必须知道字符串的长度。

The length may be indicated in various ways. 该长度可以以各种方式指示。 One way is to indicate the end of the sequence with a sentinel value , which is simply a chosen value that is not used in the sequence. 一种方法是用哨兵值指示序列的结尾,该只是序列中未使用的选定值。 With character strings, it is common to use zero as a sentinel: The string continues from its start until a zero character is found. 对于字符串,通常将零用作前哨:字符串从开头开始一直持续到找到零字符为止。 When using a sentinel, the sentinel value cannot appear inside the string, since it marks the end. 使用哨兵时,哨兵值不能出现在字符串内,因为它标记了结尾。

Another way to indicate the length is to keep it separately from the string. 指示长度的另一种方法是将其与字符串分开。 For example, the length is passed to the C memcmp routine as a separate parameter. 例如,该长度作为单独的参数传递给C memcmp例程。 This allows memcmp to compare arbitrary sequences of bytes in memory, including sequences that contain zero bytes. 这使memcmp可以比较内存中的任意字节序列,包括包含零字节的序列。

Sometimes the length is treated as part of the data structure for the string. 有时,长度被视为字符串的数据结构的一部分。 It might be in the first byte or first several bytes of the string. 它可能在字符串的第一个字节或前几个字节中。 So software using the string would get the length by reading the first byte, and the bytes after that would contain the characters of the string. 因此,使用该字符串的软件将通过读取第一个字节来获取长度,此后的字节将包含该字符串的字符。

Another method, related to the sentinel method, is to use delimiters. 与哨兵方法有关的另一种方法是使用定界符。 For example, we commonly write strings such as "abc" in source code, text, and in shell commands. 例如,我们通常在源代码,文本和shell命令中编写诸如"abc"字符串。 The quote marks are delimiters that mark the beginnings and ends of strings. 引号是分隔符,用于标记字符串的开头和结尾。 Various methods are used to allow the delimiters themselves to be characters in the strings, such as “quoting” the delimiters with other special characters, as in: "This is a quote mark: \\"." . 可以使用各种方法使分隔符本身成为字符串中的字符,例如用其他特殊字符“用引号引起来”分隔符,例如: "This is a quote mark: \\"."

In summary, the concept of a string that is not null-terminated is broad and open: Any method of indicating the length of a string other than marking the end with a null character is a string that is not null-terminated. 总而言之,不以null结尾的字符串的概念是广泛而开放的:除了用空字符标记结尾以外,任何指示字符串长度的方法都是不以null结尾的字符串。

In windows kernel programming, the most often used string type is UNICODE_STRING, a non-null terminated string type: 在Windows内核编程中,最常用的字符串类型是UNICODE_STRING,这是非空终止的字符串类型:

typedef struct _UNICODE_STRING {
    USHORT Length;
    USHORT MaximumLength;
    PWSTR  Buffer;
} UNICODE_STRING

The purpose of this data structure is to efficiently processing string along the stack drivers. 此数据结构的目的是有效地处理堆栈驱动程序中的字符串。 Each driver in the stack may append text or modify the string in the range of "MaximumLenth" without allocating a new buffer. 堆栈中的每个驱动程序都可以在“ MaximumLenth”范围内追加文本或修改字符串,而无需分配新的缓冲区。

For example, below is a typical unicode string stored in a continuous 64 bytes buffer: 例如,以下是存储在连续64个字节缓冲区中的典型unicode字符串:

address + 0 : 22 (Length) 
address + 4 : 48 (MaximumLength)
address + 8 : buffer + 16 (Buffer)
address + 16: "Hello World" (UTF16 string, may without null terminated)

The standard string manipulating function can not use on UNICODE_STRING instead you should use the Rtl*UnicodeString() functions. 标准字符串操作函数不能在UNICODE_STRING上使用,而是应该使用Rtl * UnicodeString()函数。

It would be easier to answer when we can use non null terminated strings. 当我们可以使用非null终止的字符串时,回答起来会更容易。

Some API functions take only string pointer ( SetWindowText , CreateFile ) and strings have to be terminated with null character. 一些API函数仅采用字符串指针( SetWindowTextCreateFile ),并且字符串必须以空字符终止。 Other functions ( ExtTextOut , WriteConsole ) take pointer and some form of length (usually number of char s, TCHAR s or wchar_t s. These strings don't have to be terminated by null character. 其他函数( ExtTextOutWriteConsole )采用指针和某种形式的长度(通常为charTCHARwchar_t的数量。这些字符串不必以空字符结尾。

// No termination NUL charcter bellow.
TCHAR hello[] = { 'H','E','L','L','O' };
ExtTextOut( hdc, 100, 100, 0, hello, 5, 0 );
TCHAR hello2[] = _T("HELLO WORLD!");
ExtTextOut( hdc, 100, 100, 0, hello2, 5, 0 );

In second ExtTextOut we don't have to artificially cut hello2 string (or copy it to temporary buffer). 在第二个ExtTextOut我们不必人为地剪切hello2字符串(或将其复制到临时缓冲区)。 This function allows to use parts of string without null termination requirements. 此函数允许使用字符串的某些部分,而没有空终止要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM