简体   繁体   中英

Parsing Command Line Arguments for Shell

I have this shell that I am writing for a summer project. I am trying to parse the command line for example,

If I call

ls -l

I need to parse the

-l

part.

So I can pass it in for the arguments vector that is used in side of execv . I know I am parsing it correctly but for some reason something is not with finding the directory. Am I perhaps missing something? Below is my code.

Although the strtok standard library function can be useful, you need to be aware of the shortcomings of its interface, which is basically a trap for the unwary.

In this program, you seem to have stumbled over both of the most common problems with the strtok interface. Please reread man strtok carefully in conjunction with this answer in order to avoid falling into these problems in the future. Also, do not use strtok as an example of good interface design. Instead, use it as a model for what to avoid:

Hidden global state

strtok operates on a string pointer which it keeps in a static variable. Whenever you call strtok with a non-NULL first argument, it first resets the value of this static variable to that string. At the end of each call to strtok , it sets its static variable to the address at which the next scan should start, which is just after the token it just found.

There is only one instance of the static variable in the whole program, so you can't interleave strtok scans on two different strings. Worse, you can't call a function which itself calls strtok inside a strtok scan of a string, because the call inside the function will reset the strtok state.

That means you have to be careful whenever you have more than one strtok scan in a program. In your case, after the initialization of the badly-named variable env :

token = strtok(env, ":");

you use strtok to divide your input command into pieces in the badly-named variable argv :

argv = strtok(buf_copy, " ");

so when you later want to find the next component of env :

token = strtok(NULL, ":");

strtok 's state no longer points into env ; instead it points into buf_copy (and, with your particular input, at a point in buf_copy where no more tokens will be found).

Modification of an input argument

The first argument to strtok is a char* , not a const char* .

In general, if a library function has a string argument, the argument should be declared as const char* unless the function intends to modify the string. Or, to put it another way, a const char* declaration is a promise that no attempt will be made to modify the argument, and if the promise is not made, it's probably for a good reason.

And, indeed, if you read strtok 's documentation, you will see that it explicitly modifies its input string by overwriting some delimiter characters with a NUL character. This has the effect of permanently dividing the original string into separate tokens. Sometimes that's fine, but it can get you into a lot of trouble if you want to refer to the string's original value again in the future. Often you will find yourself making a copy of the original string in order to call strtok on it. (That's often a symptom of bad program design, or a signal that strtok wasn't really the right tool to use for parsing.)

In this particular program, the trap is that getenv() does not return a copy of the environment variable's value. It returns a pointer directly into the environment variable table. Although the return type of getenv is char* , which might lead you to believe that modifying the value is ok, the C standard clearly tells you not to:

The string pointed to shall not be modified by the program

Unfortunately, this prohibition is not present in the Linux manpage for getenv , but that manpage does note that getenv gives you a pointer into the environment table. If you do modify the string returned by getenv , it is highly likely (though not guaranteed) that a subsequent call to getenv for the same environment variable will retrieve the modified value.

And that's precisely what you do: since you let strtok loose on the string returned by getenv(PATH) , a subsequent call to getenv(PATH) will see a value truncated at the first colon.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM