简体   繁体   中英

Weird behavior when prepending to a file with cat and tee

One solution to the problem from prepend to a file one liner shell? is:

cat header main | tee main > /dev/null

As some of the comments noticed, this doesn't work for large files.

Here's an example where it works:

$ echo '1' > h
$ echo '2' > t
$ cat h t | tee t > /dev/null
$ cat t
1
2

And where it breaks:

$ head -1000 /dev/urandom > h
$ head -1000 /dev/urandom > t
$ cat h t | tee t > /dev/null
^C

The command hangs and after killing it we are left with:

$ wc -l t
7470174 t

What causes the above behavior where the command gets stuck and adds lines infinitely? What is different in the 1 line files scenario?

The behavior is completely non-deterministic. When you do cat header main | tee main > /dev/null cat header main | tee main > /dev/null , the following things happen:

1) cat opens header
2) cat opens main
3) cat reads header and writes its content to stdout
4) cat reads main and writes its content to stdout
5) tee opens main for writing, truncating it
6) tee reads stdin and writes the data read into main

The ordering above is one possible ordering, but these events may occur in many different orders. 5 must precede 6, 2 must precede 4, and 1 must precede 3, but it is entirely possible for the ordering to be 5,1,3,2,4,6. In any case, if the files are large, it is very likely that step 5 will take place before step 4 is complete, which will cause portions of data to be discarded. It is entirely possible that step 5 happens first, in which case all of the data previously in main will be lost.

The particular case that you are seeing is very likely a result of cat blocking on a write and going to sleep before it has finished reading the input. tee then writes more data to t and tries to read from the pipe, then goes to sleep until cat writes more data. cat writes a buffer, tee puts it into t , and the cycle repeats, with cat re-reading the data that tee is writing into t .

cat header main | tee main > /dev/null

That is a terrible, terrible idea. You should never have a pipeline both reading from and writing to a file.

You can put the result in a temporary file first, and then move it into place:

cat header main >main.new && mv main{.new,}

Or to minimize the amount of time two copies of the file exist and never have both visible in the directory at the same time, you could delete the original once you've opened it up for reading and write the new file directly into its previous location. However, this does mean there's a brief gap during which the file doesn't exist at all.

exec 3<main && rm main && cat header - <&3 >main && exec 3<&-

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM