简体   繁体   中英

Fill missing line numbers into file using sed / awk / bash

I have a (tab-delimited) file where the first "word" on each line is the line number. However, some line numbers are missing. I want to insert new lines (with corresponding line number) so that throughout the file, the number printed on the line matches the actual line number. (This is for later consumption into readarray with cut/awk to get the line after the line number.)

I've written this logic in python and tested it works, however I need to run this in an environment that doesn't have python. The actual file is about 10M rows. Is there a way to represent this logic using sed, awk, or even just plain shell / bash?

linenumre = re.compile(r"^\d+")
i = 0
for line in sys.stdin:
    i = i + 1
    linenum = int(linenumre.findall(line)[0])

    while (i < linenum):
        print(i)
        i = i + 1

    print(line, end='')

test file looks like:

1   foo 1
2   bar 1
4   qux 1
6   quux    1
9       2
10  fun 2

expected output like:

1   foo 1
2   bar 1
3
4   qux 1
5
6   quux    1
7
8
9       2
10  fun 2

Like this, with awk :

awk '{while(++ln!=$1){print ln}}1' input.txt

Explanation, as a multiline script:

{

    # Loop as long as the variable ln (line number)
    # is not equal to the first column and insert blank
    # lines.

    # Note: awk will auto-initialize an integer variable
    # with 0 upon its first usage

    while(++ln!=$1) {
        print ln
    }
}

1 # this always expands to true, making awk print the input lines

I've written this logic in python and tested it works, however I need to run this in an environment that doesn't have python.

In case you want to have running python code where python is not installed you might freeze your code. The Hitchhiker's Guide to Python has overview of tools which are able to do it. I suggest first trying pyinstaller as it support various operation system and seems easy to use.

This might work for you (GNU join, seq and join):

join -a1 -t' ' <(seq $(sed -n '$s/ .*//p' file)) file 2>/dev/null

Join a file created by the command seq using the last line number in file with file .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM