简体   繁体   English

bash脚本需要多个文件中的行bash

[英]bash script required lines from multiple files bash

I need a shell script to be designed to print lines in a pattern from three files. 我需要一个Shell脚本来设计为以三个文件中的模式打印行。

file1.txt, file2.txt,file3.txt

I need the output to be 我需要输出

line1 of file1.txt
line2 of file1.txt
line1 of file2.txt
line2 of file2.txt
line1 of file3.txt
line2 of file3.txt
line3 of file1.txt
line4 of file1.txt
line3 of file2.txt
line4 of file2.txt
line3 of file3.txt
line4 of file3.txt

...

How can we get this in a shell script? 我们如何在shell脚本中得到它? Also it should print only the non-blank lines. 另外,它应该只打印非空白行。

Perl to the rescue: Perl解救:

perl -e 'open $FH[ @FH ], "<", $_ or die $! for @ARGV;
         while (grep !eof $_, @FH) {
             for my $fh (@FH) {
                 print scalar <$fh> for 1, 2;
             }
         }' -- file*.txt

It keeps all the files opened at the same time (the @FH array contains the filehandles). 它使所有文件同时打开(@FH数组包含文件句柄)。 While at least one hasn't ended yet, it prints two lines from each. 尽管至少有一个尚未结束,但每行都打印了两行。

What about the following script, which accepts the files as parameters : 下面的脚本呢,该脚本接受文件作为参数:

TOTAL_LINES=$(wc -l < "$1")
for n in $(seq 1 2 $TOTAL_LINES); do
  for file in "$@"; do
    sed -n "$n{p;n;p}" $file
  done
done

I've considered all files had the same number of lines as suggested in the comments, but it will also work when it's not the case provided you pass the longest file as first parameter. 我认为所有文件的行数都与注释中建议的相同,但是如果情况并非如此,只要您将最长的文件作为第一个参数传递,它也可以工作。

A little explanation on parts of the script you're the less likely to know : 对您不太可能了解的脚本部分进行一些解释:

  • seq will generate a sequence of numbers for will iterate over. seq将生成一个数字序列以for迭代。 It's syntax is seq from increment upTo and it's used instead of the {from..upTo..increment} syntax which doesn't accept variables 它的语法是seq from increment upToseq from increment upTo的语法,用于代替不接受变量的{from..upTo..increment}语法
  • $@ is an array of the parameters passed to the script $@是传递给脚本的参数数组
  • sed -n "$n{p;n;p}" is a sed command that won't display the text by default, but will execute p , n and p again for the line $n ; sed -n "$n{p;n;p}"sed命令,默认情况下不会显示文本,但是将对$n行再次执行pnp p prints the current line, n goes to the next line p打印当前行, n转到下一行

Consider four similar input files: 考虑四个类似的输入文件:

$ cat file1.txt
line1 of file1.txt
line2 of file1.txt
line3 of file1.txt
line4 of file1.txt

We create printer.sh as follows: 我们按如下方式创建printer.sh

#!/bin/bash
LINES=2 # Configure this to set the number of consecutive lines per file

MAX_HANDLE=3
# Create descriptors 3,4,... for filename1,filename2....
for var in "$@"
do
      eval exec "$MAX_HANDLE"'<"$var"'
      ((MAX_HANDLE++))
done

# Start infinite loop
while :
do
  # First descriptor is 3
  COUNTER=3

  # Loop over all open file descriptors from 3 to MAX_HANDLE - 1
  while [  $COUNTER -lt $MAX_HANDLE ]; do
    # Read $LINES lines from the open file descriptor
    LINE_COUNTER=0
    while [  $LINE_COUNTER -lt $LINES ]; do
      read -r line <&"$COUNTER" || DONE=true
      if [[ "$DONE" = true ]]; then
        exit
      fi


      # Print the line that was read
      echo "$line"
      ((LINE_COUNTER++))
    done
    ((COUNTER++))
  done
done

On executing this, the input parameters are each added to a new handle and read $LINES lines at a time (in this case 2 lines at a time). 执行此操作时,每个输入参数都添加到一个新的句柄中,并一次读取$LINES行(在这种情况下,一次读取2行)。 This only works for identical length files as OP posited. 这仅适用于与OP存放长度相同的文件。

$ ./printer.sh file1.txt file2.txt file3.txt file4.txt
line1 of file1.txt
line2 of file1.txt
line1 of file2.txt
line2 of file2.txt
line1 of file3.txt
line2 of file3.txt
line1 of file4.txt
line2 of file4.txt
line3 of file1.txt
line4 of file1.txt
line3 of file2.txt
line4 of file2.txt
line3 of file3.txt
line4 of file3.txt
line3 of file4.txt
line4 of file4.txt

You can use paste with awk to get your output: 您可以将awkpaste一起使用以获取输出:

paste -d $'\01' file[123].txt |
awk -F '\01' 'NR%2{for (i=1; i<=NF; i++) a[i]=$i; next} 
    {for (i=1; i<=NF; i++) print a[i] ORS $i}'

line1 of file1.txt
line2 of file1.txt
line1 of file2.txt
line2 of file2.txt
line1 of file3.txt
line2 of file3.txt
line3 of file1.txt
line4 of file1.txt
line3 of file2.txt
line4 of file2.txt
line3 of file3.txt
line4 of file3.txt
  • Using paste we create side-by-side control-A (ASCII 1) delimited output 使用paste我们创建并排control-A (ASCII 1)定界输出
  • Using awk with field separator as control-A we output 2 lines from each column 使用带有字段分隔符的awk作为control-A我们从每列输出2行

lots of answers. 很多答案。 This one is awk 这个是awk

create the test files 创建测试文件

for f in file{1,2,3}.txt; do rm $f; for n in {1,2,3,4}; do echo "line $n of file $f" >> $f; done; done

and the awk program 和awk程序

awk '
    FNR == 1 && NR>1 {
        exit # exit after completing the first file
    }
    {
        # print 2 lines from the first file
        if (NF) print
        getline; if (NF) print
        # print 2 lines from each other file
        for (i=2; i<ARGC; i++) {
            getline < ARGV[i]; if (NF) print
            getline < ARGV[i]; if (NF) print
        }
    }
' file{1,2,3}.txt

The if (NF) print lines exclude blank lines since the number of whitespace-separated fields will be zero. if (NF) print行排除空白行,因为用空格分隔的字段数将为零。

line 1 of file file1.txt
line 2 of file file1.txt
line 1 of file file2.txt
line 2 of file file2.txt
line 1 of file file3.txt
line 2 of file file3.txt
line 3 of file file1.txt
line 4 of file file1.txt
line 3 of file file2.txt
line 4 of file file2.txt
line 3 of file file3.txt
line 4 of file file3.txt

This may not be the most efficient approach, but this will work, assuming that you have all your files in $files, and $total_lines contains the number of lines in each file: 这可能不是最有效的方法,但如果您将所有文件都放在$ files中,并且$ total_lines包含每个文件中的行数,则该方法将起作用:

for line in $(seq 1 $total_lines)
do
    for file in $files
    do
        sed '/^$/d' $file | sed $line'!d'
    done
done

sed '/^$/d' removes all the empty lines from the stream; sed'/ ^ $ / d'从流中删除所有空行;

sed $line'!d' prints out the line corresponding to $line sed $ line'!d'打印出与$ line相对应的行

Using paste and awk. 使用粘贴和awk。

$ cat test.sh 
paste -d '|' file* | awk -F\| '{
    if(NR % 2 == 1) {
        file1 = $1; 
        file2 = $2; 
        file3 = $3; 
    } else {
        file1 = file1 "\n" $1; 
        file2 = file2 "\n" $2; 
        file3 = file3 "\n" $3; 
        print file1;
        print file2;
        print file3;
    }
}'

Because all files have same length, we can pasted all files first and printed when row number is even. 由于所有文件的长度相同,因此我们可以先粘贴所有文件,然后在行数为偶数时打印。

If you don't mind creating intermediate/temporary files, split(1) which is part of coreutils of every Linux distribution might be handy: 如果您不介意创建中间/临时文件,则每个Linux发行版coreutils的一部分split(1)可能会很方便:

#!/bin/bash

# Split files every 2 lines using a numeric suffix 
for f in file*.txt; do
    split -d -l 2 "${f}" "${f}"split
done

# Reverse intermediate file names, so we can glob them in numeric order 
for f in file*split*; do
    mv "${f}" "reversed$(echo ${f}|rev)"
done

cat reversed* && rm reversed*

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM