简体   繁体   中英

cat/Xargs/command VS for/bash/command

The page 38 of the book Linux 101 Hacks suggests:

cat url-list.txt | xargs wget –c

I usually do:

for i in `cat url-list.txt`
   do
      wget -c $i
   done 

Is there some thing, other than length, where the xargs-technique is superior to the old good for-loop-technique in bash?

Added

The C source code seems to have only one fork. In contrast, how many forks have the bash-combo? Please, elaborate on the issue.

xargs is designed to process multiple inputs for each process it forks. A shell script with a for loop over its inputs must fork a new process for each input. Avoiding that per-process overhead can give an xargs solution a significant performance enhancement.

From the Rationale section of a UNIX manpage for xargs . (Interestingly this section doesn't appear in the OS X BSD version of xargs , nor in the GNU version.)

The classic application of the xargs utility is in conjunction with the find utility to reduce the number of processes launched by a simplistic use of the find -exec combination. The xargs utility is also used to enforce an upper limit on memory required to launch a process. With this basis in mind, this volume of POSIX.1-2008 selected only the minimal features required.

In your follow-up, you ask how many forks the other version will have. Jim already answered this : one per iteration. How many iterations are there? It's impossible to give an exact number, but easy to answer the general question. How many lines are there in your url-list.txt file?

There are other some other considerations. xargs requires extra care for filenames with spaces or other no-no characters, and -exec has an option ( + ), that groups processing into batches. So, not everyone prefers xargs , and perhaps it's not best for all situations.

See these links:

Also consider:

xargs -I'{}' wget -c '{}' < url-list.txt

but wget provides an even better means for the same:

wget -c -i url-list.txt

With respect to the xargs versus loop consideration, i prefer xargs when the meaning and implementation are relatively "simple" and "clear", otherwise, i use loops.

xargs还允许你有一个巨大的列表,这对于“for”版本是不可能的,因为shell使用的命令行长度有限。

instead of GNU/Parallel i prefer using xargs' built in parallel processing. Add -P to indicate how many forks to perform in parallel. As in...

 seq 1 10 | xargs -n 1 -P 3 echo

would use 3 forks on 3 different cores for computation. This is supported by modern GNU Xargs. You will have to verify for yourself if using BSD or Solaris.

根据您的互联网连接,您可能希望使用GNU Parallel http://www.gnu.org/software/parallel/并行运行它。

cat url-list.txt | parallel wget -c

One advantage I can think of is that, if you have lots of files, it could be slightly faster since you don't have as much overhead from starting new processes.

I'm not really a bash expert though, so there could be other reasons it's better (or worse).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM