[英]Match URL pattern within file using SED, AWK or GREP
I am trying to use grep to extract a list of urls beginning with http and ending with jpg. 我正在尝试使用grep提取以http开始并以jpg结尾的网址列表。
grep -o 'picturesite.com/wp-content/uploads/.......' filename
grep -o'picturesite.com/wp-content/uploads / .......'文件名
The code above is how far I've gotten. 上面的代码是我已经走了多远。 I then need to pass these file names to curl
然后,我需要传递这些文件名来卷曲
title : "Family Vacation", jpg:" http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg ", owner : "PhotoTaker"
标题:“家庭度假”,jpg:“ http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg ”,所有者:“ PhotoTaker”
You can capture url
patterns by doing: 您可以通过执行以下操作捕获
url
模式:
grep -o 'http.*.jpg' file
$ grep -o 'http.*.jpg' <<EOF
> title : "Family Vacation", jpg:"http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg", owner : "PhotoTaker
> EOF
http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg
curl
does not take url
from standard input so your best bet would be to store the extracted url
to a file and then reading the file one line at a time and passing the variable that holds the line to curl
command. curl
不从标准输入中获取url
,因此最好的选择是将提取的url
存储到文件中,然后一次读取一行文件,然后将包含该行的变量传递给curl
命令。
sed -nr 's/http\S*(jpg\|gif\|other\|ext)/\
curl $CURLOPTS & >$OUT/p' <$infile | sh -n
The above command will search $infile for any string beginning with "http" followed by any length of non-whitespace characters and ending with any of the "\\|" 上面的命令将在$ infile中搜索任何以“ http”开头,其后为任意长度的非空格字符,并以“ \\ |”结尾的字符串 separated file extensions contained in the parentheses.
括号中包含分隔的文件扩展名。
Once it's found such a string sed will substitute it into the curl commandline on the second line to replace "&." 一旦找到,这样的字符串sed会将其替换到第二行的curl命令行中,以替换“&”。 It will then pipe the command string to sh for execution.
然后它将命令字符串传递给sh以便执行。
Remember, sed is the stream editor, not just the stream searcher, so it can very capably pre-process input for other commands to make them do what you want. 请记住,sed是流编辑器,而不仅仅是流搜索器,因此它可以非常有能力地预处理其他命令的输入,以使它们执行您想要的操作。
Note: sh is currently passed the 'noexecute' argument which basically works more like echo than anything else. 注意:sh当前被传递了'noexecute'参数,该参数基本上比echo更为有效。 When you've run it a few times and are satisfied you're doing the right thing you'll need to remove it for any effect.
运行几次后,如果您对它感到满意,那么您在做正确的事情就需要删除它才能产生任何效果。
Note 2: If there's a chance you'll want to match more than one url per line you'll need the 'g' sed option. 注意2:如果有可能您希望每行匹配多个网址,则需要使用'g'sed选项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.