简体   繁体   中英

How to grep two patterns at once and have the result in one string?

I have existing log files that have, among others, following type of lines:

2018-05-14T10:10:22.769029+03:00 timom usbmonitor: [INFORMATION 6] [FILE: UsbChecker.cpp:51][FUNC: vendorCheck][MSG: USB vendors changed: "0403 14e1 05e3 05e3 03f0 0403 0bda 1d6b 1d6b 1d6b 1d6b 1d6b 1d6b 1d6b" ]

From these files I want to grep lines above so that I get the timestamp from the beginning and the text inside quotes so that I'd have a nice and compact output:

2018-05-14T10:10:22.769029+03:00 0403 14e1 05e3 05e3 03f0 0403 0bda 1d6b 1d6b 1d6b 1d6b 1d6b 1d6b 1d6b

Is there a way to do this with a one-liner?
I'm looking for a way to efficiently get the desired output without the need to loop over grepped lines. I have thousands of log files each of which may have hundreds of matches so the grep/sed/whatever needs to be efficient.

So far I've done it like this:

#!/bin/bash
INPUTDIR=
OUTPUTDIR=
while getopts ":h:d:o:" OPTION; do
    case $OPTION in
        h)
            usage
            exit 1
            ;;
        d)
            INPUTDIR=$OPTARG
            ;;
        o)
            OUTPUTDIR=$OPTARG
            ;;
        ?)
            usage
            exit 1
            ;;
    esac
done
if [ -z $INPUTDIR ] || [ -z $OUTPUTDIR ]; then
    echo "BAD ARGUMENTS: both directories aren't given" >&2
    usage
    exit 1
fi
OUTPUTFILE="$(date +%Y%m%d%H%M%S)-usb-analysis-summary"
for i in $( ls $INPUTDIR ); do
    # Interesting files are of format <number>_<number>
    if [ $(echo "$i" | grep -Ev "^[0-9]+_[0-9]+$") ] ; then
        echo "Skipping $i"
        continue
    fi
    grep vendorCheck $INPUTDIR/$i | while read -r l ; do
        # We do know timestamp is 32 characters long. GEFN
        echo "$l" | sed -r "s|^(.{32}).*changed: \"(.*)\".*|\1 \2|" >>$OUTPUTFILE
    done
done

But this is not optimal as now I'm looping the files and then looping grep matches from each file.

I tried

grep "vendorCheck" $INPUTDIR/$i | sed -r "s|^(.{32}).*changed: \"(.*)\".*|\1 \2|"

But this removes line breaks.
Then if I put multiple patterns in one grep I'm also in trouble with formatting; I need to get the timestamp and text inside quotes to one line, and next similar match to next line.

Sed can do the line selection matching and editing all at a go.

You could also use $(...) to generate sed's input file list, so you really can get it all into one line, I think, but that ls isn't ideal, and you said you needed filenames in a comment below, so...

Rather than

sed -r -n '/vendorCheck/{s/(.{32}).*changed: \"(.*)\"/\1 \2/; p;}' $( ls -1 $INPUTDIR | egrep '^[0-9]+_[0-9]+$' ) >> $OUTPUTFILE

You can embed some whitespace to make it a little less ugly without changing the "one-liner" functionality, and a loop can replace the ls :

for f in $INPUTDIR/[0-9]*_[0-9]* # limit input, not a definitive check
do echo "$f" | egrep '^[0-9]+_[0-9]+$' || continue # CONFIRM filename match
   [[ -f $f ]] || continue  # and assert file, not dir
   sed -r -n "/vendorCheck/{
      s/(.{32}).*changed: \"(.*)\"/\1 \2/;
      s/^/$f: /;
      p;
   }" "$f" # the "s/^/$f: /;" is a placeholder of your need for the name
done >> $OUTPUTFILE

NOTE: deleted my test data, so this rework didn't get vetted as carefully. Let me know if anyone sees a typo.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM