Regular expression: determine info from utilite output with grep

Question

An utilite nconvert produce info about some file:

nconvert -info file.tiff

output

** NCONVERT v7.00 (c) 1991-2017 Pierre-E Gougelet (Apr 18 2017/09:49:26) **
        Version for Windows NT/9x/2000/Xp/Vista/7  (All rights reserved)
** This is freeware software (for non-commercial use)

Over...

file.tiff : Success
    Format               : TIFF
    Name                 : tiff
    Compression          : CCITT Group 4
    Width                : 3194
    Height               : 5056
    Components per pixel : 1
    Bits per component   : 1
    Depth                : 1
    # colors             : 2
    Color model          : RGB
    Bytes Per Plane      : 400
    Orientation          : Top Left
    Xdpi                 : 600
    Ydpi                 : 600
    Page(s)              : 30
    Info:
      Photometric Interpretation: White=0
      PhotometricInterpretation: 0
      PlanarConfiguration: 1
      SamplesPerPixel: 1
      Software: LIBFORMAT (c) Pierre-e Gougelet
    Metadata             : ( EXIF )

I need extract numeral information using grep. Suppose, I want to define number of pages, I use

nconvert -info file.tiff | grep -oP "(?<=Page\(s\)).*$"

I get:

      : 30

But I need only number 30 !

Modification below also does not bring the desired result

nconvert -info efile.tiff | grep -oP "(?<=Page\(s\)\s+\:).*$"

How can I get info after colons?

Answer 1

You can use this grep :

nconvert -info efile.tiff | grep -oP 'Page\(s\)\h*:\h*\K\d+'
30

\\K will reset the matched information.

You can also use awk :

nconvert -info efile.tiff | awk -F '[: \t]*' '$2=="Page(s)"{print $3}'
30

Answer 2

You need to convert the positive lookbehind with \\K match reset operator here to allow variable width pattern before the value you need to extract:

grep -oP 'Page\(s\)\s*:\s*\K.*'

Here,

Page\\(s\\) - matches Page(s)
\\s*:\\s* - matches : enclosed with 0+ whitespaces
\\K - omits the text matched so far
.* - matches the rest of the line.

Answer 3

nconvert -info file.tiff |
sed -n '/^[[:space:]]*Page\(s\)/{s/^[^[:digit:]]*//;p}'

should do it. __

Explanation

-n in sed restricts it printing every line to the output. By default it prints everything.
/pattern/ is self explanatory, ie to look for a pattern, enclose it in two forward slashes.
/^pattern/ looks for a pattern in the beginning of the line
/^[[:space:]]*Page\\(s\\)/ looks for any number of spaces in the beginning of a line followed by Page(s)
If we find the above part in the lines, then sed processes the commands which are inside the curly braces {commands} .
The first command is substitute which has the format s/patten/substitution/ .
[] in regex is used for character ranges, for example [AZ] or [0-9]
But character classes are also available and the [:digit:] character class is the same as 0-9 By putting a ^ in the beginning of [] you're negating that particular character class. So in short s/^[^[:digit:]]*// means delete any non-digit characters at the beginning. Note * means any zero or more times.
The p at the end prints the lines. Also not s and p commands are separated by a semicolon

Regular expression: determine info from utilite output with grep

Question

3 answers

solution1
1 2017-06-27 09:56:04

solution2
1 2017-06-27 09:56:49

solution3
1 2017-06-27 09:57:25

Regular expression: determine info from utilite output with grep

Question

3 answers

solution1 1 2017-06-27 09:56:04

solution2 1 2017-06-27 09:56:49

solution3 1 2017-06-27 09:57:25

solution1
1 2017-06-27 09:56:04

solution2
1 2017-06-27 09:56:49

solution3
1 2017-06-27 09:57:25