简体   繁体   中英

Read line range from a file and find largest value within the range in another file

I'm looking to extract the largest value from a range of line numbers in a file, with the range being read from another file.

Define three files:

position_file: Containing two columns of integers defining a range of line numbers so col1[i] < col2[i]

full_data_file: Containing a single column of numerical data (>=0)

extracted_data_file: Containing for each line in position_file the largest value in full_data_file where the line number in full_data_file falls within the range defined in position_file

cat position_file
1 3
5 7

cat full_data_file
1
4.3
5.2
2.0
0.1
0
4
9

cat extracted_data_file
5.2
4

My current way of doing this is

while read pos1 pos2; do
    awk -v p1="$pos1" -v p2="$pos2" 'BEGIN {max=0} NR>=p1 && NR<=p2 && $1>max {max=$1} END {print max}' < full_data_file >> extracted_data_file
done < position_file

This is not a good way because I repeatedly load full_data_file to memory, which is very slow. I'm looking for a way to do this in a single step. I'm not very accomplished in using arrays in awk but I imagine the solution will probably (but not necessarily) utilize arrays in awk.

Thank you very much for your help.

You may use this awk :

awk 'FNR==NR{a[FNR]=$1; next} {max=a[$1]; for (i=$1+1; i<=$2; i++) 
if (a[i]>max) max=a[i]; print max}' full_data_file position_file > extracted_data_file

cat extracted_data_file
5.2
4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM