bash: Reading first 'n' entries in a file

Question

I have a series of very big single-lined files of space separated values. It looks like

0.993194 0.9684194 0.846847658 1.0 1.0 1.0 1.0 0.78499 0.54879564 0.9998545 ...

I would like to read the first copy the first n elements of each file.

I could convert the spaces into new lines ( cat file.txt | tr ' ' '\\n' > file2.txt ) and then read it line by line and save each line in a new file ( head -n $n file2.txt | while read line; do echo $line >> file3.txt;done ) but that would be very slow. (Above code not tested)

How can I efficiently copy the first n values of a single-lined file?

Note: I am fine with copying the first n characters even if this correspond to an undefined number of values.

Answer 1

How about just using awk with specifying the number of records you want?

awk -v n=5 '{for(i=1;i<=n;i++) print $i}' file
0.993194
0.9684194
0.846847658
1.0
1.0

(or) to print in the same line using printf

awk -v n=5 '{for(i=1;i<=n;i++) printf "%s ",$i}' file
0.993194 0.9684194 0.846847658 1.0 1.0

(or) using cut with POSIX compliant options, -d for setting the de-limiter and -f 1-5 for fields 1 through 5.

cut -d' ' -f 1-5 file
0.993194 0.9684194 0.846847658 1.0 1.0

Answer 2

I'd use a carefully-designed regex in egrep , with the -o flag to make it only print the output that matches:

egrep -e '^([0-9.]+[ ]*){3}' -o file.txt

Prints out:

0.993194 0.9684194 0.846847658

As grep is a pretty well-known and very heavily-optimized tool, this performs pretty well; I just tried it on a 3-megabyte text file and it didn't take significantly longer than it took on a 30-byte text file.

bash: Reading first 'n' entries in a file

Question

2 answers

solution1
2 ACCPTED 2017-04-12 17:19:22

solution2
1 2017-04-12 17:23:27

bash: Reading first 'n' entries in a file

Question

2 answers

solution1 2 ACCPTED 2017-04-12 17:19:22

solution2 1 2017-04-12 17:23:27

solution1
2 ACCPTED 2017-04-12 17:19:22

solution2
1 2017-04-12 17:23:27