简体   繁体   中英

How to grep for a string pattern from command output in shell script?

I am compressing my pdf file using ghostscript which throws error on password protected case which I have to handle.

Shell script

GS_RES=`gs -sDEVICE=pdfwrite -sOutputFile=$gsoutputfile -dNOPAUSE -dBATCH $2 2>&1`

if [ "$GS_RES" != "" ]
then
    gspassmsg="This file requires a password for access"
    echo "Error message is :::::: "$GS_RES
    gspassworddoc=`awk -v a="$GS_RES" -v b="$gspassmsg" 'BEGIN{print index(a,b)}'`
    if [ $gspassworddoc -ne 0 ]
    then
        exit 3 #error code - password protected pdf
    fi
fi

And my GS_RES value after executing the command is like the following

Error message 1:

GPL Ghostscript 9.19 (2016-03-23) Copyright (C) 2016 Artifex Software, Inc. All 
rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for d
etails. Error: /syntaxerror in -file- Operand stack: Execution stack: %interp_ex
it .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --n
ostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1967 1 3 %opa
rray_pop 1966 1 3 %oparray_pop 1950 1 3 %oparray_pop 1836 1 3 %oparray_pop --nos
tringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringva
l-- 2 %stopped_push Dictionary stack: --dict:1196/1684(ro)(G)-- --dict:0/20(G)--
 --dict:78/200(L)-- Current allocation mode is local Current file position is 1

Error message 2:

GPL Ghostscript 9.19 (2016-03-23) Copyright (C) 2016 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: Cannot find a 'startxref' anywhere in the file. Output may be incorrect. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: An error occurred while reading an XREF table. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html The file has been damaged. This may have been caused gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html by a problem while converting or transfering the file. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Ghostscript will attempt to recover the data. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html However, the output may be incorrect. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: Trailer dictionary not found. Output may be incorrect. No pages will be processed (FirstPage > LastPage). gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html This file had errors that were repaired or ignored. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Please notify the author of the software that produced this gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html file that it does not conform to Adobe's published PDF gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html specification. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html The rendered output from this file may be incorrect.

On running awk on Error message 2

gspassmsg="This file requires a password for access"
gspassworddoc=`awk -v a="$GS_RES" -v b="$gspassmsg" 'BEGIN{print index(a,b)}'`

It throws me the following error

Error : awk: newline in string GPL Ghostscript 9.19... at source line 1

Error message 3

   **** Error: Cannot find a 'startxref' anywhere in the file.
   **** Warning:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
   **** Error:  Trailer is not found.

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

I couldn't capture this error with the snippet from the below answer

if ! gs_res=$(gs -sDEVICE=pdfwrite -sOutputFile="$gsoutputfile" -dNOPAUSE -dBATCH "$2" 2>&1 1>/dev/null); then
  echo "Error message is :::::: $gs_res" >&2
  gspassmsg='This file requires a password for access'
  [[ $gs_res == *"$gspassmsg"* ]] && exit 3 # password protected pdf
  echo "Some other error !"
fi

Please clarify me the following

  1. Why awk behaves weird here? What I'm missing?
  2. How to grep for a pattern in a string which contains special characters?
  3. Does Ghostscript has any predefined error messages like that? If possible please suggest some documentation to refer..
  4. Is it possible to compress password protected pdf with ghostscript?
  5. How can i ensure for gs compression success in the above case? Since I may not know about different possible error which Ghostscript may throw so that i could cross check with my executed command result.

I am quite new to this shell script. Someone please help me on this.

PS: I have edited my question with additional details. Please look into it. If something has to be added i'll add it.

Ghostscript's error messages all follow the same pattern, however there are some gotchas:

Part of the output is a dump of the operand stack at the time of the error. Since PostScript is a programming language, the contents of the stack depends on the program, and is entirely unpredictable. Even though you are dealing with PDF files, not PostScript programs, the interpreter is itself written in PostScript, so the same still applies.

The

'Error: /syntaxerror...'

is limited to a small number of actual possible errors, the PostScript Language Reference Manual defines them.

PostScript (but not PDF) programs can install an error handler, which can totally alter the error output, and even swallow the error altogether.

As regards 'compressing PDF files', that is absolutely not what you are doing. Please have a read here which explains what's actually happening. In short though, you are producing a new PDF file, not compressing an old one.

You can, of course, process a password protected PDF file with Ghostscript, as long as you know the password. Look for PDFPassword in the documentation here

Now the error message you quote above is not due to the file being encrypted (password protected), there's something else wrong with it. In fact given the simple command line you are using, I'd say there's something quite seriously wrong with it. Of course without seeing the file I can't tell for certain.

Now if a file is encrypted, the output from Ghostscript should read something like:

GPL Ghostscript GIT PRERELEASE 9.21 (2016-09-14) Copyright (C) 2016 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details.

**** This file requires a password for access.

Error: /invalidfileaccess in pdf_process_Encrypt

Operand stack:

Execution stack: %interp_exit .runexec2 --nostringval--
--nostringval-- --nostringval- - 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- fa lse 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_ pop 1966 1 3
%oparray_pop --nostringval-- --nostringval-- --nostri ngval--
--nostringval-- false 1 %stopped_push Dictionary stack: --dict:1199/1684(ro)(G)-- --dict:1/20(G)-- --dict:83/200(L)-- --dict:83 /200(L)-- --dict:135/256(ro)(G)-- --dict:291/300(ro)(G)-- --dict:26/32(L)- - Current allocation mode is local GPL Ghostscript GIT PRERELEASE 9.21: Unrecoverable error, exit code 1

So simply grepping for "This file requires a password" should be enough to identify encrypted files.

Now, as noted by mklement0, if you'd like to explain what it is about your actual script which is causing a problem, perhaps we can help with that too. You haven't shown the output of your script, or explained what is not working as you expect.

KenS's helpful answer addresses your questions about Ghostscript itself.
Here's a streamlined version of your code that should work:

# Run `gs` and capture its stderr output.
gs_res=$(gs -sDEVICE=pdfwrite -sOutputFile="$gsoutputfile" -dNOPAUSE -dBATCH "$2" 2>&1 1>/dev/null)
ec=$? # Save gs's exit code.

# Assume that something went wrong, IF:
#   - gs reported a nonzero exit code
#   - but *also* if any stderr output was produced, as
#     not all problems may be reflected in a nonzero exit code.
if [[ $ec -ne 0 || -n $gs_res ]]; then
  echo "Error message is :::::: $gs_res" >&2
  gspassmsg='This file requires a password for access'
  [[ $gs_res == *"$gspassmsg"* ]] && exit 3 # password protected pdf
fi
  • I've double-quoted the variable and parameter references in your gs command .

  • I've changed your redirection from just 2>&1 to 2>&1 1>/dev/null so as to only capture stderr output.

    • 2>&1 redirects stderr ( 2 ) to the (still-original) stdout ( 1 ), so that error messages are sent to stdout and can be captured as part of the command substitution ( $(...) ); 1>/dev/null then redirects stdout to the null device, effectively silencing all stdout output. Note that the earlier redirection of stderr to the original stdout is not affected by this, so in effect what the overall command sends to stdout is the original stderr output only.
      If you want to know more, see this answer of mine.
  • I'm using the more modern and flexible $(..) command-substitution syntax instead of the legacy `...` form (for background information, see here ).

  • I've renamed GS_RES to gs_res , because it is better not to use all-uppercase shell-variable names in order to avoid conflicts with environment variables and special shell variables .

  • I'm using simple pattern matching to find the desired substring in gs 's stderr output. Given that you already have the input to test against in a variable, Bash's own string-matching features will do (which are actually quite varied), and there is no need to use an external utility such as awk .


As for why your awk command failed :

It sounds like you're using BSD awk , such as the one that comes with macOS as of 10.12 (your question is tagged linux , however):

BSD awk doesn't support newlines in variable values passed via -v unless you \\ -escape the newlines.
With unescaped multi-line strings, your awk call fails fundamentally, before index() is ever called.

By contrast, GNU Awk and Mawk do support multi-line strings as-is passed via -v .

Read on for optional background information .


To determine which awk implementation you're using, run awk --version and examine the output:

  • awk version 20070501 -> BSD Awk

  • GNU Awk 4.1.3, API: 1.1 ... -> GNU Awk

  • mawk: not an option: --version -> Mawk

Here's a simple test to try with your Awk version:

awk -v a=$'1\n2' -v b=2 'BEGIN { print index(a, b) }'

Gnu Awk and Mawk output 3 , as expected, whereas BSD Awk fails with awk: newline in string 1 .

Also note that \\ -escaping newlines works ONLY in BSD Awk (eg,
awk -v var=$'1\\\\\\n2' 'BEGIN { print var }' ), which unfortunately means that there is no portable way to pass multi-line variable values to Awk .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM