简体   繁体   中英

Search in a webpage using bash

I am trying to retrieve a webpage, search it for some pattern, retrieve that value and do some calculations with it. My Problem is, i can't seem to figure out how to search for the pattern in a given string.

Lets say i retrieve a Page like this

content=$(curl -L http://google.com)

now i want to search for a value im interested in, which is basically a html tag.

<div class="digits">123,456,789</div>

No i did try to find this by using sed. My Attempt looked like this:

n=$(echo "$content"|sed '<div class=\"digits\">(\\d\\d,\\d\\d\\d,\\d\\d\\d)</div>')

i want to pull that value every, lets say 10 minutes, save it and estimate when 124,xxx,xxx will be met.

My Problem is i don't really know how to save those values, but i think i can figure that out on my own. Im more interested in how to retrieve that substring as i always get an error because of the "<".

i hope someone is able and willing to help me :)

Better use a proper parser with :

xmllint --html --xpath '//*[@class="digits"]' http://domain.tld/ 

But it seems that the example url you gave in the comments don't contains this class name. You can prove it by running first :

curl -Ls url | grep -oP '<div\s+class="digits">\K[^<]+'

It's best to use a proper parser as @sputnick suggested.

Or you can try something like this:

curl -L url | perl -ne '/<div class="digits">([\d,]+)<.div>/ && {print $1, "\n"}'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM