简体   繁体   English

二进制数据作为命令行参数

[英]Binary data as command line argument

I have a simple c++ program (and a similar one for c) that just prints out the first argument 我有一个简单的c ++程序(和c相似的程序),只打印出第一个参数

#include <iostream>

int main(int argc, char** argv)
{
    if(argc > 1)
        std::cout << ">>" << argv[1] << "<<\n";
}

I can pass binary data (i have tried on bash) as argument like 我可以将二进制数据(我在bash上尝试过)作为参数传递

$./a.out $(printf "1\x0123")
  >>1?23<<

If I try to pass a null there i get 如果我尝试传递null,我会得到

./a.out $(printf "1\x0023")
bash: warning: command substitution: ignored null byte in input
>>123<<

Clearly bash(?) does not allow this 显然bash(?)不允许这样做

But is it possible to send a null as a command line argument this way? 但是是否可以通过这种方式将null作为命令行参数发送? Do either c or c++ put any restrictions on this? C或C ++是否对此有任何限制?

Edit: I am not using this in day-to-day c++, this question is just out of curiosity 编辑:我不在日常c ++中使用此,这个问题只是出于好奇

This answer is written in C, but can be compiled as C++ and works the same in both. 这个答案是用C编写的,但可以编译为C ++,并且在两者中均相同。 I quote from the C11 standard; 我引用了C11标准; there are equivalent definitions in the C++ standards . C ++标准中有等效的定义。

There isn't a good way to pass null bytes to a program's arguments 没有一种将空字节传递给程序参数的好方法

C11 §5.1.2.2.1 Program startup : C11§5.1.2.2.1程序启动
If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup. 如果argc的值大于零,则数组成员argv[0]argv[argc-1]包含在内)应包含指向字符串的指针,主机环境在程序启动之前将其指定为实现定义的值。

C11 §7.1.1 Definitions of terms C11§7.1.1术语的定义
A string is a contiguous sequence of characters terminated by and including the first null character. 字符串是由第一个空字符终止并包括第一个空字符的连续字符序列。

That means that each argument passed to main() in argv is a null-terminated string. 这意味着传递给argv main()每个参数都是一个以空值结尾的字符串。 There is no reliable data after the null byte at the end of the string — searching there would be accessing out of bounds of the string. 字符串末尾的空字节之后没有可靠的数据-搜索将访问字符串的边界之外。

So, as noted at length in the comments to the question, it is not possible in the ordinary course of events to get null bytes to a program via the argument list because null bytes are interpreted as being the end of each argument. 因此,正如对该问题的注释中详细指出的那样,在正常事件过程中,不可能通过参数列表将空字节获取到程序,因为空字节被解释为每个参数的结尾。

By special agreement 根据特别协议

That doesn't leave much wriggle room. 这不会留下太多的麻烦空间。 However, if both the calling/invoking program and the called/invoked program agree on the convention, then, even with the limitations imposed by the standards, you can pass arbitrary binary data, including arbitrary sequences of null bytes, to the invoked program — up to the limits on the length of an argument list imposed by the implementation. 但是,如果调用/调用程序和被调用/调用程序都同意该约定,那么即使受到标准的限制,您也可以将任意二进制数据(包括任意空字节序列)传递给被调用程序-取决于实现对参数列表长度的限制。

The convention has to be along the lines of: 约定必须遵循以下原则:

  • All arguments (except argv[0] , which is ignored, and the last argument, argv[argc-1] ) consist of a stream of non-null bytes followed by a null. 所有参数( argv[0]除外,最后一个参数argv[argc-1]除外)均由非空字节流后跟null组成。
  • If you need adjacent nulls, you have to provide empty arguments on the command line. 如果需要相邻的null,则必须在命令行上提供空参数。
  • If you need trailing nulls, you have to provide empty arguments as the last arguments on the command line. 如果需要结尾的null,则必须在命令行上提供空参数作为最后一个参数。

This could lead to a program such as ( null19.c ): 这可能会导致程序如( null19.c ):

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void hex_dump(const char *tag, size_t size, const char *buffer);

int main(int argc, char **argv)
{
    if (argc < 2)
    {
        fprintf(stderr, "Usage: %s arg1 [arg2 '' arg4 ...]\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    size_t len_args = 0;
    for (int i = 1; i < argc; i++)
        len_args += strlen(argv[i]) + 1;

    char buffer[len_args];

    size_t offset = 0;
    for (int i = 1; i < argc; i++)
    {
        size_t arglen = strlen(argv[i]) + 1;
        memmove(buffer + offset, argv[i], strlen(argv[i]) + 1);
        offset += arglen;
    }
    assert(offset != 0);
    offset--;

    hex_dump("Argument list", offset, buffer);
    return 0;
}

static inline size_t min_size(size_t x, size_t y) { return (x < y) ? x : y; }

static void hex_dump(const char *tag, size_t size, const char *buffer)
{
    printf("%s (%zu):\n", tag, size);
    size_t offset = 0;
    while (size != 0)
    {
        printf("0x%.4zX:", offset);
        size_t count = min_size(16, size);
        for (size_t i = 0; i < count; i++)
            printf(" %.2X", buffer[offset + i] & 0xFF);
        putchar('\n');
        size -= count;
        offset += count;
    }
}

This could be invoked using: 可以使用以下命令调用它:

$ ./null19 '1234' '5678' '' '' '' '' 'def0' ''
Argument list (19):
0x0000: 31 32 33 34 00 35 36 37 38 00 00 00 00 00 64 65
0x0010: 66 30 00
$

The first argument is deemed to consist of 5 bytes — four digits and a null byte. 第一个参数被认为是由5个字节组成-4位数字和一个空字节。 The second is similar. 第二个类似。 The third through sixth arguments each represent a single null byte (it gets painful if you need large numbers of contiguous null bytes), then there is another string of five bytes (three letters, one digit, one null byte). 第三个到第六个参数分别代表一个空字节(如果需要大量连续的空字节会很痛苦),然后是另一个包含五个字节的字符串(三个字母,一个数字,一个空字节)。 The last argument is empty but ensures that there is a null byte at the end. 最后一个参数为空,但确保结尾处有一个空字节。 If omitted, the output would not include that final terminal null byte. 如果省略,则输出将不包含该最终终端空字节。

$ ./null19 '1234' '5678' '' '' '' '' 'def0' 
Argument list (18):
0x0000: 31 32 33 34 00 35 36 37 38 00 00 00 00 00 64 65
0x0010: 66 30
$

This is the same as before except there is no trailing null byte in the data. 除数据中没有尾随空字节外,其余与之前相同。 The two examples in the question are easily handled: 问题中的两个示例很容易处理:

$ ./null19 $(printf "1\x0123")
Argument list (4):
0x0000: 31 01 32 33
$ ./null19 1 23
Argument list (4):
0x0000: 31 00 32 33
$

This works strictly within the standard assuming only that empty strings are recognized as valid arguments. 假设仅将空字符串识别为有效参数,这完全在标准内有效。 In practice, those arguments are already contiguous in memory so it might be possible on many platforms to avoid the copying phase into the buffer. 实际上,那些参数在内存中已经是连续的,因此在许多平台上有可能避免将复制阶段复制到缓冲区中。 However, the standard does not stipulate that the argument strings are laid out contiguously in memory. 但是,该标准没有规定参数字符串在内存中是连续布置的。

If you need multiple arguments with binary data, you can modify the convention. 如果需要带有二进制数据的多个参数,则可以修改约定。 For example, you could take a control argument of a string which indicates how many subsequent physical arguments make up one logical binary argument. 例如,您可以采用字符串的控制参数,该参数指示有多少个后续物理参数组成一个逻辑二进制参数。

All this relies on the programs interpreting the argument list as agreed. 所有这些都依赖于程序按照约定解释参数列表。 It is not really a general solution. 这并不是真正的通用解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM