Reading from file bash Linux

Question

I am having a hard time with the following bash script:

basically what the script does is receives a directory and then it searches in all of the folders that are in the directory for files that end with.log. after that it should print to the stdout all the lines from those files sorted by the date they were written in.

my script is this:

#!/bin/bash

find . -name ".*log" | cat *.log |  sort --stable --reverse --key=2,3

when i run the script it does return the list but the sort doesnt work properly. my guess is because in some files there are \n which makes it start a new line.

is there a way to ignore the \n that are in the file while still having each line return on a new line?

thank you!

xxd command output:

ise@ise-virtual-machine:~$ xxd /home/ise/Downloads/f1.log 00000000: 3230 3139 2d30 382d 3232 5431 333a 3333 2019-08-22T13:33 00000010: 3a34 342e 3132 3334 3536 3738 3920 4865:44.123456789 He 00000020: 6c6c 6f0a 576f 726c 640a 0032 3032 302d llo.World..2020- 00000030: 3031 2d30 3154 3131 3a32 323a 3333 2e31 01-01T11:22:33.1 00000040: 3233 3435 3637 3839 206c 6174 650a 23456789 late. ise@ise-virtual-machine:~$ xxd /home/ise/Downloads/f2.log 00000000: 3230 3139 2d30 392d 3434 5431 333a 3434 2019-09-44T13:44 00000010: 3a32 312e 3938 3736 3534 3332 3120 5369:21.987654321 Si 00000020: 6d70 6c65 206c 696e 650a mple line. ise@ise-virtual-machine:~$ xxd /home/ise/Downloads/f3.log 00000000: 3230 3139 2d30 382d 3232 5431 333a 3333 2019-08-22T13:33 00000010: 3a34 342e 3132 3334 3536 3738 3920 4865:44.123456789 He 00000020: 6c6c 6f0a 576f 726c 6420 320a 0032 3032 llo.World 2..202 00000030: 302d 3031 2d30 3154 3131 3a32 323a 3333 0-01-01T11:22:33 00000040: 2e31 3233 3435 3637 3839 206c 6174 6520.123456789 late 00000050: 320a 2.

Answer 1

Given that the entries in the log file are terminated with \0 (NUL), find, sed and sort can be combined:

find . -name '*.log' | xargs sed -z 's/\n//g' | sort -z --key=2,3 --reverse

Answer 2

By assuming each record in the file starts with the date and the option --key=2,3 is not necessary, please try:

find . -name "*.log" -exec cat '{}' \; | sort -z | xargs -I{} -0 echo "{}"

The final command xargs.. echo.. will be necessary to print properly the null-terminated lines.
If you still require --key option, please modify the code as you like. I'm not aware how the lines look like as of now.

[UPDATE]

According to the provided information by the OP, I assume the format of the log files will be:

Each record starts with the date in "yyyy-mm-ddTHH:MM:SS.nanosec" format and a simple dictionary order sort can be applied.
Each record ends with "\n\0" except for the last record of the file which ends just with "\n" .
Each record may contain newline character(s) in the middle as a part of the record for the line folding purpose.

Then how about:

find . -name "*.log" -type f -exec cat "{}" \; -exec echo -ne "\0" \; | sort -z

echo -ne "\0" appends a null character to the last record of a file. Otherwise the record will be merged to the next record of another file.
The -z option to sort treats the null character as a record separator.
No other option to sort will be required so far.

Result with the posted input by the OP:

2019-08-22T13:33:44.123456789 Hello
World
2019-08-22T13:33:44.123456789 Hello
World 2
2019-09-44T13:44:21.987654321 Simple line
2020-01-01T11:22:33.123456789 late
2020-01-01T11:22:33.123456789 late 2

It still keeps the null character "\0" at the end of each record. If you want to trim it off, please add the tr -d "\0" command at the end of the pipeline as:

find . -name "*.log" -type f -exec cat "{}" \; -exec echo -ne "\0" \; | sort -z | tr -d "\0"

Hope this helps.

Reading from file bash Linux

Question

2 answers

solution1
3 2019-11-04 19:32:15

solution2
1 2019-11-05 07:14:00

Reading from file bash Linux

Question

2 answers

solution1 3 2019-11-04 19:32:15

solution2 1 2019-11-05 07:14:00

solution1
3 2019-11-04 19:32:15

solution2
1 2019-11-05 07:14:00