简体   繁体   English

程序如何继承环境变量?

[英]How does a program inherit environment variables?

When I use the function getenv() from the Standard C Library, my program inherit the environment variables from its parent. 当我使用标准C库中的函数getenv() ,我的程序从其父级继承环境变量。

Example: 例:

$ export FOO=42
$ <<< 'int main() {printf("%s\n", getenv("FOO"));}' gcc -w -xc - && ./a.exe
42

In libc, the environ variable is declared into environ.c . 在libc中, environ变量声明为environ.c I am expecting it to be empty at the execution, but I get 42 . 我期待它在执行时是空的,但我得到42

Going a bit further getenv can be simplified as follow: 进一步getenv可简化如下:

char * getenv (const char *name)
{
    size_t len = strlen (name);
    char **ep;
    uint16_t name_start;

    name_start = *(const uint16_t *) name;
    len -= 2;
    name += 2;

    for (ep = __environ; *ep != NULL; ++ep)
    {
        uint16_t ep_start = *(uint16_t *) *ep;

        if (name_start == ep_start && !strncmp (*ep + 2, name, len)
                && (*ep)[len + 2] == '=')
            return &(*ep)[len + 3];
    }
    return NULL;
}
libc_hidden_def (getenv)

Here I will just get the content of the __environ variable. 在这里,我将获得__environ变量的内容。 However I never initialized it. 但是我从来没有初始化它。

So I get confused because environ is supposed to be NULL unless my main function is not the real entry point of my program. 所以我感到困惑,因为environ应该是NULL除非我的main函数不是我程序的真正入口点。 Perhaps gcc is ticking me by adding an _init function that is part of the standard C library. 也许gcc通过添加一个_init函数来标记我,它是标准C库的一部分。

Where is environ initialized? environ初始化在哪里?

The environment variables are passed down from the parent process as a third argument to main . 环境变量从父进程向下传递,作为main第三个参数 The easiest way to discover this is to read the documentation for the system call execve , particularly this bit: 发现这个的最简单方法是阅读系统调用execve的文档,特别是这一点:

 int execve(const char *filename, char *const argv[], char *const envp[]); 

Description 描述

execve() executes the program pointed to by filename . execve()执行filename指向的程序。 [...] argv is an array of argument strings passed to the new program. [...] argv是传递给新程序的参数字符串数组。 By convention, the first of these strings should contain the filename associated with the file being executed. 按照惯例,这些字符串中的第一个应包含与正在执行的文件关联的文件名。 envp is an array of strings, conventionally of the form key=value , which are passed as environment to the new program. envp是一个字符串数组,通常格式为key=value ,它们作为环境传递给新程序。 Both argv and envp must be terminated by a NULL pointer. argvenvp必须由NULL指针终止。 The argument vector and environment can be accessed by the called program's main function, when it is defined as: 参数向量和环境可以被被调用程序的main函数访问,当它被定义为:

 int main(int argc, char *argv[], char *envp[]) 

The C library copies the envp argument into the environ global variable somewhere in its startup code, before it calls main : for instance, GNU libc does this in _init and musl libc does it in __init_libc . 在它调用main之前,C库将envp参数复制到其启动代码中的environ全局变量中:例如,GNU libc在_init执行此操作,而musl libc在__init_libc执行此__init_libc (You may find musl libc's code easier to trace through than GNU libc's.) Conversely, if you start a program using one of the exec wrapper functions that don't take an explicit environment vector, the C library supplies environ as the third argument to execve . (您可能会发现musl libc的代码比GNU libc更易于跟踪。)相反,如果使用采用显式环境向量的exec包装函数之一启动程序,则C库将environ作为第三个参数提供给execve Inheritance of environment variables is thus strictly a user-space convention. 因此,环境变量的继承严格地是用户空间约定。 As far as the kernel is concerned, each program receives two argument vectors, and it doesn't care what's in them. 就内核而言,每个程序都接收两个参数向量,而不关心它们中的内容。

(Note that three-argument main is an extension to the C language. The C standard only specifies int main(void) and int main(int argc, char **argv) but it permits implementations to define additional forms ( C11 Annex J.5.1 Environment Arguments ). The three-argument main has been how environment variables work since Unix V7 if not longer, and is documented by Microsoft too — see What should main() return in C and C++? .) (注意,三参数main是C语言的扩展.C标准只指定int main(void)int main(int argc, char **argv)但它允许实现定义其他形式( C11 Annex J. 5.1环境参数 )。三个参数main是自Unix V7以来环境变量如何工作,如果不是更长,并且也由Microsoft记录 - 请参阅main()在C和C ++中应返回什么?

There is no mystery here. 这里没有神秘感。

First, the shell forks. 首先,外壳分叉。 Forked process obviously has the same environment. 分叉过程显然具有相同的环境。 Then a new program is executed in the child. 然后在孩子中执行新程序。 The syscall in question is execve , which amongst other things accepts a pointer to an environment. 有问题的系统调用是execve ,其中包括指向环境的指针。

So there, what environment is set after execing a binary depends entirely on the code which was doing the exec. 那么,在执行二进制文件之后设置的环境完全取决于执行exec的代码。

All this is can be easily seen by running strace. 所有这一切都可以通过运行strace轻松看出。

EDIT: since the question was edited to ask about environ : 编辑:因为编辑问题询问environ

When you execute a dynamically linked binary, the very first userspace code doing anything comes from the loader. 当您执行动态链接的二进制文件时,执行任何操作的第一个用户空间代码都来自加载程序。 The loader amongst other things sets up variables like argc , argv or environ and only then calls main() from the binary. 加载器等设置变量,如argcargvenviron ,然后从二进制文件中调用main()

Once more, sources for all this are freely available. 再一次,所有这些的来源都是免费提供的。 While glibc's sources are rather hard to read due to atrocious formatting, BSD ones are easy and conceptually equivalent enough. 虽然glibc的源代码由于残酷的格式化而难以阅读,但BSD很容易在概念上等同。

http://code.metager.de/source/xref/freebsd/libexec/rtld-elf/rtld.c#389 http://code.metager.de/source/xref/freebsd/libexec/rtld-elf/rtld.c#389

Under Linux when a program starts it has its arguments and environmental variables stored on the stack. 在Linux下程序启动时,它的参数和环境变量存储在堆栈中。 For C programs the code that executes before main looks at this, makes the argv and envp arrays of pointers, and then calls main with these values (and argc ). 对于C程序之前执行的代码main在这看起来,使得argvenvp指针数组,然后调用main使用这些值(和argc )。

When a program calls execvpe to turn into a new program (often after calling fork ) then an envp is passed in, along with an argv . 当一个程序调用execvpe变成一个新程序时(通常在调用fork ),然后传入一个envp和一个argv The kernel will copy the data at these into the new program's stack. 内核会将这些数据复制到新程序的堆栈中。

When any of the other exec functions are called then the glibc will pass in the current program's environ as the new program's envp to execvpe (or directly to sys_exec). 当调用任何其他exec函数时,glibc将作为新程序的envp execvpe (或直接传递给sys_exec)传入当前程序的environ

The question is really, How does the shell run commands? 问题是,shell如何运行命令?

The answer is by creating a new process probably using fork() and execl() , which creates a process with the same environment as the current process. 答案是通过使用fork()execl()创建一个新进程,它创建一个与当前进程具有相同环境的进程。

You can however create a new process with a custom environment using execvpe() / execle() . 您可以使用但是创建新的进程与自定义环境execvpe() / execle()

But in any normal situation that wouldn't be necessary, and specially since many programs expect some environment variables to be defined like PATH for example, normally a child process inherits the environment variables from the environment where it is invoked. 但是在任何不必要的正常情况下,特别是因为许多程序期望某些环境变量被定义为例如PATH ,所以通常子进程从调用它的环境继承环境变量。

The father process that calls your program (your shell) defines FOO. 调用程序(您的shell)的父进程定义了FOO。 The newly created process receives a copy from the parent. 新创建的进程从父级接收副本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM