简体   繁体   中英

Regex for capturing Curl HTTP Status Code and body response

I'm attempting to create a regex that captures both the HTTP status code as well as the body of a curl request. The regex pattern below works on multiple online sites, but won't match in a shell if-statement on my Mac's command line. Is my regex off or is there something else going on?

RESPONSE=$(curl -s -i -X GET http://www.google.com/)

# Match and capture the status code, match the headers, match two new lines, match and capture an optional body
re="^HTTP\/\d\.\d\s([\d]{3})[\w\d\s\W\D\S]*[\r\n]{2}([\w\d\s\W\D\S]*)?$"

if [[ "${RESPONSE}" =~ $re ]]; then
  echo "match"
  # Now do stuff with the captured groups, "${BASH_REMATCH[...]}"
else
  echo "no match"
fi

I'm also open to other ways of doing this (I'm targeting a machine running CentOS 5).

Same idea as @delarsschneider, slightly less complicated

RESPONSE=$(curl -s -i -X GET http://www.google.com/)

CODE=$(echo $RESPONSE | sed -n 's/HTTP.* \(.*\) .*/\1/p')

BODY=$(echo $RESPONSE | tr '\n' ' ' | sed -n 's/.*GMT *\(.*\)/\1/p')

echo $CODE
echo $BODY

Since you are open to other solutions, too, you can try this out.

RESPONSE=$(curl -s -i -X GET http://www.google.com/)

HTTP_STATUS_CODE=`echo $RESPONSE | sed '
  /HTTP/ { 
    s/^HTTP[^ ]* //
    s/ .*$//
    q
  }
  D'`

BODY=`echo $RESPONSE | sed '
  /^.$/ {
    :body
    n
    b body
  }
  D'`

echo $HTTP_STATUS_CODE
echo $BODY

HTTP_STATUS_CODE is found in the first line starting with HTTP. Every non-space until the first space is removed and from the result ('302 Found') everything from first space till the end of the line is removed.

BODY starts at the first line matching a single char (lines before are deleted with 'D'). From here print every line until the end of the input.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM