Search in a webpage using bash

Question

I am trying to retrieve a webpage, search it for some pattern, retrieve that value and do some calculations with it. My Problem is, i can't seem to figure out how to search for the pattern in a given string.

Lets say i retrieve a Page like this

content=$(curl -L http://google.com)

now i want to search for a value im interested in, which is basically a html tag.

<div class="digits">123,456,789</div>

No i did try to find this by using sed. My Attempt looked like this:

n=$(echo "$content"|sed '<div class=\"digits\">(\\d\\d,\\d\\d\\d,\\d\\d\\d)</div>')

i want to pull that value every, lets say 10 minutes, save it and estimate when 124,xxx,xxx will be met.

My Problem is i don't really know how to save those values, but i think i can figure that out on my own. Im more interested in how to retrieve that substring as i always get an error because of the "<".

i hope someone is able and willing to help me :)

Answer 1

Better use a proper parser with xpath :

xmllint --html --xpath '//*[@class="digits"]' http://domain.tld/

But it seems that the example url you gave in the comments don't contains this class name. You can prove it by running first :

curl -Ls url | grep -oP '<div\s+class="digits">\K[^<]+'

Answer 2

It's best to use a proper parser as @sputnick suggested.

Or you can try something like this:

curl -L url | perl -ne '/<div class="digits">([\d,]+)<.div>/ && {print $1, "\n"}'

Search in a webpage using bash

Question

2 answers

solution1
1 2013-11-23 22:25:20

solution2
0 2013-11-24 07:30:07

Search in a webpage using bash

Question

2 answers

solution1 1 2013-11-23 22:25:20

solution2 0 2013-11-24 07:30:07

solution1
1 2013-11-23 22:25:20

solution2
0 2013-11-24 07:30:07