简体   繁体   中英

bash script required lines from multiple files bash

I need a shell script to be designed to print lines in a pattern from three files.

file1.txt, file2.txt,file3.txt

I need the output to be

line1 of file1.txt
line2 of file1.txt
line1 of file2.txt
line2 of file2.txt
line1 of file3.txt
line2 of file3.txt
line3 of file1.txt
line4 of file1.txt
line3 of file2.txt
line4 of file2.txt
line3 of file3.txt
line4 of file3.txt

...

How can we get this in a shell script? Also it should print only the non-blank lines.

Perl to the rescue:

perl -e 'open $FH[ @FH ], "<", $_ or die $! for @ARGV;
         while (grep !eof $_, @FH) {
             for my $fh (@FH) {
                 print scalar <$fh> for 1, 2;
             }
         }' -- file*.txt

It keeps all the files opened at the same time (the @FH array contains the filehandles). While at least one hasn't ended yet, it prints two lines from each.

What about the following script, which accepts the files as parameters :

TOTAL_LINES=$(wc -l < "$1")
for n in $(seq 1 2 $TOTAL_LINES); do
  for file in "$@"; do
    sed -n "$n{p;n;p}" $file
  done
done

I've considered all files had the same number of lines as suggested in the comments, but it will also work when it's not the case provided you pass the longest file as first parameter.

A little explanation on parts of the script you're the less likely to know :

  • seq will generate a sequence of numbers for will iterate over. It's syntax is seq from increment upTo and it's used instead of the {from..upTo..increment} syntax which doesn't accept variables
  • $@ is an array of the parameters passed to the script
  • sed -n "$n{p;n;p}" is a sed command that won't display the text by default, but will execute p , n and p again for the line $n ; p prints the current line, n goes to the next line

Consider four similar input files:

$ cat file1.txt
line1 of file1.txt
line2 of file1.txt
line3 of file1.txt
line4 of file1.txt

We create printer.sh as follows:

#!/bin/bash
LINES=2 # Configure this to set the number of consecutive lines per file

MAX_HANDLE=3
# Create descriptors 3,4,... for filename1,filename2....
for var in "$@"
do
      eval exec "$MAX_HANDLE"'<"$var"'
      ((MAX_HANDLE++))
done

# Start infinite loop
while :
do
  # First descriptor is 3
  COUNTER=3

  # Loop over all open file descriptors from 3 to MAX_HANDLE - 1
  while [  $COUNTER -lt $MAX_HANDLE ]; do
    # Read $LINES lines from the open file descriptor
    LINE_COUNTER=0
    while [  $LINE_COUNTER -lt $LINES ]; do
      read -r line <&"$COUNTER" || DONE=true
      if [[ "$DONE" = true ]]; then
        exit
      fi


      # Print the line that was read
      echo "$line"
      ((LINE_COUNTER++))
    done
    ((COUNTER++))
  done
done

On executing this, the input parameters are each added to a new handle and read $LINES lines at a time (in this case 2 lines at a time). This only works for identical length files as OP posited.

$ ./printer.sh file1.txt file2.txt file3.txt file4.txt
line1 of file1.txt
line2 of file1.txt
line1 of file2.txt
line2 of file2.txt
line1 of file3.txt
line2 of file3.txt
line1 of file4.txt
line2 of file4.txt
line3 of file1.txt
line4 of file1.txt
line3 of file2.txt
line4 of file2.txt
line3 of file3.txt
line4 of file3.txt
line3 of file4.txt
line4 of file4.txt

You can use paste with awk to get your output:

paste -d $'\01' file[123].txt |
awk -F '\01' 'NR%2{for (i=1; i<=NF; i++) a[i]=$i; next} 
    {for (i=1; i<=NF; i++) print a[i] ORS $i}'

line1 of file1.txt
line2 of file1.txt
line1 of file2.txt
line2 of file2.txt
line1 of file3.txt
line2 of file3.txt
line3 of file1.txt
line4 of file1.txt
line3 of file2.txt
line4 of file2.txt
line3 of file3.txt
line4 of file3.txt
  • Using paste we create side-by-side control-A (ASCII 1) delimited output
  • Using awk with field separator as control-A we output 2 lines from each column

lots of answers. This one is awk

create the test files

for f in file{1,2,3}.txt; do rm $f; for n in {1,2,3,4}; do echo "line $n of file $f" >> $f; done; done

and the awk program

awk '
    FNR == 1 && NR>1 {
        exit # exit after completing the first file
    }
    {
        # print 2 lines from the first file
        if (NF) print
        getline; if (NF) print
        # print 2 lines from each other file
        for (i=2; i<ARGC; i++) {
            getline < ARGV[i]; if (NF) print
            getline < ARGV[i]; if (NF) print
        }
    }
' file{1,2,3}.txt

The if (NF) print lines exclude blank lines since the number of whitespace-separated fields will be zero.

line 1 of file file1.txt
line 2 of file file1.txt
line 1 of file file2.txt
line 2 of file file2.txt
line 1 of file file3.txt
line 2 of file file3.txt
line 3 of file file1.txt
line 4 of file file1.txt
line 3 of file file2.txt
line 4 of file file2.txt
line 3 of file file3.txt
line 4 of file file3.txt

This may not be the most efficient approach, but this will work, assuming that you have all your files in $files, and $total_lines contains the number of lines in each file:

for line in $(seq 1 $total_lines)
do
    for file in $files
    do
        sed '/^$/d' $file | sed $line'!d'
    done
done

sed '/^$/d' removes all the empty lines from the stream;

sed $line'!d' prints out the line corresponding to $line

Using paste and awk.

$ cat test.sh 
paste -d '|' file* | awk -F\| '{
    if(NR % 2 == 1) {
        file1 = $1; 
        file2 = $2; 
        file3 = $3; 
    } else {
        file1 = file1 "\n" $1; 
        file2 = file2 "\n" $2; 
        file3 = file3 "\n" $3; 
        print file1;
        print file2;
        print file3;
    }
}'

Because all files have same length, we can pasted all files first and printed when row number is even.

If you don't mind creating intermediate/temporary files, split(1) which is part of coreutils of every Linux distribution might be handy:

#!/bin/bash

# Split files every 2 lines using a numeric suffix 
for f in file*.txt; do
    split -d -l 2 "${f}" "${f}"split
done

# Reverse intermediate file names, so we can glob them in numeric order 
for f in file*split*; do
    mv "${f}" "reversed$(echo ${f}|rev)"
done

cat reversed* && rm reversed*

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM