An utilite nconvert produce info about some file:
nconvert -info file.tiff
output
** NCONVERT v7.00 (c) 1991-2017 Pierre-E Gougelet (Apr 18 2017/09:49:26) **
Version for Windows NT/9x/2000/Xp/Vista/7 (All rights reserved)
** This is freeware software (for non-commercial use)
Over...
file.tiff : Success
Format : TIFF
Name : tiff
Compression : CCITT Group 4
Width : 3194
Height : 5056
Components per pixel : 1
Bits per component : 1
Depth : 1
# colors : 2
Color model : RGB
Bytes Per Plane : 400
Orientation : Top Left
Xdpi : 600
Ydpi : 600
Page(s) : 30
Info:
Photometric Interpretation: White=0
PhotometricInterpretation: 0
PlanarConfiguration: 1
SamplesPerPixel: 1
Software: LIBFORMAT (c) Pierre-e Gougelet
Metadata : ( EXIF )
I need extract numeral information using grep. Suppose, I want to define number of pages, I use
nconvert -info file.tiff | grep -oP "(?<=Page\(s\)).*$"
I get:
: 30
But I need only number 30
!
Modification below also does not bring the desired result
nconvert -info efile.tiff | grep -oP "(?<=Page\(s\)\s+\:).*$"
How can I get info after colons?
You can use this grep
:
nconvert -info efile.tiff | grep -oP 'Page\(s\)\h*:\h*\K\d+'
30
\\K
will reset the matched information.
You can also use awk
:
nconvert -info efile.tiff | awk -F '[: \t]*' '$2=="Page(s)"{print $3}'
30
You need to convert the positive lookbehind with \\K
match reset operator here to allow variable width pattern before the value you need to extract:
grep -oP 'Page\(s\)\s*:\s*\K.*'
Here,
Page\\(s\\)
- matches Page(s)
\\s*:\\s*
- matches :
enclosed with 0+ whitespaces \\K
- omits the text matched so far .*
- matches the rest of the line. nconvert -info file.tiff |
sed -n '/^[[:space:]]*Page\(s\)/{s/^[^[:digit:]]*//;p}'
should do it. __
Explanation
-n
in sed
restricts it printing every line to the output. By default it prints everything. /pattern/
is self explanatory, ie to look for a pattern, enclose it in two forward slashes. /^pattern/
looks for a pattern in the beginning of the line /^[[:space:]]*Page\\(s\\)/
looks for any number of spaces in the beginning of a line followed by Page(s)
sed
processes the commands which are inside the curly braces {commands}
. s/patten/substitution/
. []
in regex is used for character ranges, for example [AZ]
or [0-9]
[:digit:]
character class is the same as 0-9
By putting a ^
in the beginning of []
you're negating that particular character class. So in short s/^[^[:digit:]]*//
means delete any non-digit characters at the beginning. Note *
means any zero or more
times. p
at the end prints the lines. Also not s
and p
commands are separated by a semicolon
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.