简体   繁体   中英

understanding SED commands

I need to understand a shell code which uses the following command to fetch directions from a source to destination using GOOGLE MAPS API:

wget --no-parent -O - https://maps.googleapis.com/maps/api/directions/json?origin=$begin\&destination=$finish\&sensor=false > new.txt

Next we fetch the following line of the output:

**"html_instructions" : "Head \u003cb\u003enorthwest\u003c/b\u003e"**

grep -n html_instructions  new.txt > new1.txt

Can somebody please tell me the meaning of using:

sed -e 's/\\u003cb//g'

etc in the following command:

sed -e 's/\\u003cb//g' -e 's/\\u003e//g' -e 's/\\u003c\/b//g' -e 's/\\u003c//g' -e 's/div.*div//g' -e 's/.*://g' -e 's/"//g' -e 's/ "//g' new1.txt > new2.txt

Which outputs Head northwest only.

Thanks in advance!

sed -e 's/\\u003cb//g' -e 's/\\u003e//g' -e 's/\\u003c\/b//g' -e 's/\\u003c//g' -e 's/div.*div//g' -e 's/.*://g' -e 's/"//g' -e 's/ "//g' new1.txt > new2.txt

The string after each -e is a sed command. The sed command s/\\\ searches for all occurrences of the unicode character 003CB (which is a greek small letter upsilon with dialytika ) and replaces it with nothing. In other words, it remove the character from the string.

The command s/.*://g removes any text from the beginning of the line to the last colon in the line. s/"//g removes the every occurence of the double-quote character. s/ "//g removes every occurrence of space followed by double-quote.

With a g appended at the end, as in s/new/old/g , it makes the substitution globally: looks for every occurrence of new and replaces it with old. Adding a lot of power to these commands, new may be a regular expression. Consider s/.*: //g . The dot character has the special meaning of "any character at all". The star character means zero or more of the preceding character. Thus the regular expression . The dot character has the special meaning of "any character at all". The star character means zero or more of the preceding character. Thus the regular expression . The dot character has the special meaning of "any character at all". The star character means zero or more of the preceding character. Thus the regular expression .*:` means zero or more of any characters followed by a colon.

You can take all in one go with awk :

awk -F\" '/html_instructions/ {gsub(/(\\u003(c|cb|e)|\/b)/,x);print $4}'
Head northwest

So whole line should be:

wget --no-parent -O - https://maps.googleapis.com/maps/api/directions/json?origin=$begin\&destination=$finish\&sensor=false | awk -F\" '/html_instructions/ {gsub(/(\\u003(c|cb|e)|\/b)/,x);print $4}'
Head northwest

to get it into a variable

d=$(wget --no-parent -O - https://maps.googleapis.com/maps/api/directions/json?origin=$begin\&destination=$finish\&sensor=false | awk -F\" '/html_instructions/ {gsub(/(\\u003(c|cb|e)|\/b)/,x);print $4}')
echo $d
Head northwest

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM