简体   繁体   English

解析Shell的命令行参数

[英]Parsing Command Line Arguments for Shell

I have this shell that I am writing for a summer project. 我有一个要为一个夏季项目编写的shell。 I am trying to parse the command line for example, 例如,我正在尝试解析命令行,

If I call 如果我打电话

ls -l

I need to parse the 我需要解析

-l

part. 部分。

So I can pass it in for the arguments vector that is used in side of execv . 因此,我可以将其传递给execv端使用的参数向量。 I know I am parsing it correctly but for some reason something is not with finding the directory. 我知道我正确解析了它,但是由于某种原因,找不到目录。 Am I perhaps missing something? 我可能想念什么吗? Below is my code. 下面是我的代码。

Although the strtok standard library function can be useful, you need to be aware of the shortcomings of its interface, which is basically a trap for the unwary. 尽管strtok标准库功能可能很有用,但您需要了解其接口的缺点,这基本上是对粗心的陷阱。

In this program, you seem to have stumbled over both of the most common problems with the strtok interface. 在此程序中,您似乎偶然发现了strtok界面的两个最常见问题。 Please reread man strtok carefully in conjunction with this answer in order to avoid falling into these problems in the future. 请结合此答案仔细阅读man strtok ,以免将来陷入这些问题。 Also, do not use strtok as an example of good interface design. 另外,请勿将strtok用作良好界面设计的示例。 Instead, use it as a model for what to avoid: 而是将其用作避免情况的模型:

Hidden global state 隐藏的全球状态

strtok operates on a string pointer which it keeps in a static variable. strtok对保留在静态变量中的字符串指针进行操作。 Whenever you call strtok with a non-NULL first argument, it first resets the value of this static variable to that string. 每当您使用第一个非NULL参数调用strtok时,它都会首先将此静态变量的值重置为该字符串。 At the end of each call to strtok , it sets its static variable to the address at which the next scan should start, which is just after the token it just found. 在对strtok的每次调用结束时,它将其静态变量设置为下一次扫描应开始的地址,该地址恰好在它刚刚找到的令牌之后。

There is only one instance of the static variable in the whole program, so you can't interleave strtok scans on two different strings. 整个程序中只有一个静态变量实例,因此您不能在两个不同的字符串上交错进行strtok扫描。 Worse, you can't call a function which itself calls strtok inside a strtok scan of a string, because the call inside the function will reset the strtok state. 更糟糕的是,您无法调用本身在字符串的strtok扫描内调用strtok的函数,因为该函数内的调用将重置strtok状态。

That means you have to be careful whenever you have more than one strtok scan in a program. 这意味着在程序中进行多次strtok扫描时,您必须要小心。 In your case, after the initialization of the badly-named variable env : 在您的情况下,在对名称错误的变量env进行初始化之后:

token = strtok(env, ":");

you use strtok to divide your input command into pieces in the badly-named variable argv : 您可以使用strtok将输入命令分为几个名字不完整的变量argv

argv = strtok(buf_copy, " ");

so when you later want to find the next component of env : 因此,当您以后想要查找env的下一个组件时:

token = strtok(NULL, ":");

strtok 's state no longer points into env ; strtok的状态不再指向env instead it points into buf_copy (and, with your particular input, at a point in buf_copy where no more tokens will be found). 相反,它指向buf_copy (并且使用您的特定输入,指向buf_copy中没有更多令牌的位置)。

Modification of an input argument 修改输入参数

The first argument to strtok is a char* , not a const char* . strtok的第一个参数是char* ,而不是const char*

In general, if a library function has a string argument, the argument should be declared as const char* unless the function intends to modify the string. 通常,如果库函数具有字符串参数,则除非函数打算修改字符串,否则该参数应声明为const char* Or, to put it another way, a const char* declaration is a promise that no attempt will be made to modify the argument, and if the promise is not made, it's probably for a good reason. 或者,换句话说, const char*声明是一个承诺,即不会尝试修改该参数,并且如果未做出承诺,则可能是有充分的理由的。

And, indeed, if you read strtok 's documentation, you will see that it explicitly modifies its input string by overwriting some delimiter characters with a NUL character. 而且,的确,如果您阅读strtok的文档,将会看到它通过用NUL字符覆盖一些分隔符来显式修改其输入字符串。 This has the effect of permanently dividing the original string into separate tokens. 这具有将原始字符串永久划分为单独标记的作用。 Sometimes that's fine, but it can get you into a lot of trouble if you want to refer to the string's original value again in the future. 有时候很好,但是如果您以后想再次引用字符串的原始值,可能会给您带来很多麻烦。 Often you will find yourself making a copy of the original string in order to call strtok on it. 通常,您会发现自己在复制原始字符串,以便对其调用strtok (That's often a symptom of bad program design, or a signal that strtok wasn't really the right tool to use for parsing.) (这通常是不良程序设计的征兆,或者是信号strtok并不是真正用于解析的正确工具。)

In this particular program, the trap is that getenv() does not return a copy of the environment variable's value. 在此特定程序中,陷阱是getenv()不返回环境变量值的副本。 It returns a pointer directly into the environment variable table. 它直接将指针返回到环境变量表中。 Although the return type of getenv is char* , which might lead you to believe that modifying the value is ok, the C standard clearly tells you not to: 尽管getenv的返回类型为char* ,这可能使您认为修改该值是可以的,但C标准显然告诉您不要:

The string pointed to shall not be modified by the program 指向的字符串不得由程序修改

Unfortunately, this prohibition is not present in the Linux manpage for getenv , but that manpage does note that getenv gives you a pointer into the environment table. 不幸的是,在Linux的getenv页中没有这个禁止,但是该手册页确实指出getenv为您提供了指向环境表的指针。 If you do modify the string returned by getenv , it is highly likely (though not guaranteed) that a subsequent call to getenv for the same environment variable will retrieve the modified value. 如果您确实修改了getenv返回的字符串,则很有可能(尽管不能保证)随后对同一环境变量的getenv调用将检索修改后的值。

And that's precisely what you do: since you let strtok loose on the string returned by getenv(PATH) , a subsequent call to getenv(PATH) will see a value truncated at the first colon. 而这正是您要做的事情:由于您在getenv(PATH)返回的字符串上放了strtok ,因此随后对getenv(PATH)调用将在第一个冒号处截断一个值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM