简体   繁体   English

使用SED,AWK或GREP匹配文件中的URL模式

[英]Match URL pattern within file using SED, AWK or GREP

I am trying to use grep to extract a list of urls beginning with http and ending with jpg. 我正在尝试使用grep提取以http开始并以jpg结尾的网址列表。

grep -o 'picturesite.com/wp-content/uploads/.......' filename grep -o'picturesite.com/wp-content/uploads / .......'文件名

The code above is how far I've gotten. 上面的代码是我已经走了多远。 I then need to pass these file names to curl 然后,我需要传递这些文件名来卷曲

title : "Family Vacation", jpg:" http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg ", owner : "PhotoTaker" 标题:“家庭度假”,jpg:“ http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg ”,所有者:“ PhotoTaker”

You can capture url patterns by doing: 您可以通过执行以下操作捕获url模式:

grep -o 'http.*.jpg' file

$ grep -o 'http.*.jpg' <<EOF
> title : "Family Vacation", jpg:"http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg", owner : "PhotoTaker
> EOF 
http://picturesite.com/wp-content/uploads/2014/01/mypicture.jpg

curl does not take url from standard input so your best bet would be to store the extracted url to a file and then reading the file one line at a time and passing the variable that holds the line to curl command. curl不从标准输入中获取url ,因此最好的选择是将提取的url存储到文件中,然后一次读取一行文件,然后将包含该行的变量传递给curl命令。

sed -nr 's/http\S*(jpg\|gif\|other\|ext)/\
    curl $CURLOPTS & >$OUT/p' <$infile | sh -n

The above command will search $infile for any string beginning with "http" followed by any length of non-whitespace characters and ending with any of the "\\|" 上面的命令将在$ infile中搜索任何以“ http”开头,其后为任意长度的非空格字符,并以“ \\ |”结尾的字符串 separated file extensions contained in the parentheses. 括号中包含分隔的文件扩展名。

Once it's found such a string sed will substitute it into the curl commandline on the second line to replace "&." 一旦找到,这样的字符串sed会将其替换到第二行的curl命令行中,以替换“&”。 It will then pipe the command string to sh for execution. 然后它将命令字符串传递给sh以便执行。

Remember, sed is the stream editor, not just the stream searcher, so it can very capably pre-process input for other commands to make them do what you want. 请记住,sed是流编辑器,而不仅仅是流搜索器,因此它可以非常有能力地预处理其他命令的输入,以使它们执行您想要的操作。

Note: sh is currently passed the 'noexecute' argument which basically works more like echo than anything else. 注意:sh当前被传递了'noexecute'参数,该参数基本上比echo更为有效。 When you've run it a few times and are satisfied you're doing the right thing you'll need to remove it for any effect. 运行几次后,如果您对它感到满意,那么您在做正确的事情就需要删除它才能产生任何效果。

Note 2: If there's a chance you'll want to match more than one url per line you'll need the 'g' sed option. 注意2:如果有可能您希望每行匹配多个网址,则需要使用'g'sed选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除电子邮件模式,使用grep,awk或sed保留其余模式? - deleting email pattern, keeping the rest using grep, awk or sed? grep / pcregrep / sed / awk最后一次匹配到文件末尾后的数据 - grep/pcregrep/sed/awk the data after the last match to the end of a file 正则表达式匹配并使用awk / grep / sed / bash / vim打印 - Regex match and print using awk/grep/sed/bash/vim 选择两个模式之间的第一个匹配。如果使用sed / awk / grep找到第三个模式,则重新开始搜索 - Select first match between two patterns.Restart search if a 3rd pattern is found using sed/awk/grep Grep / Sed / Awk块并搜索模式 - Grep/Sed/Awk a block and search for pattern 使用Grep Sed或Awk搜索和替换EDL文件中的行 - Using Grep Sed or Awk to search and replace lines in EDL file 在CSV中选择与GNU Linux中的模式文件中的任何模式都不匹配的行(AWK / SED / GREP) - Select rows in a CSV not matching any pattern in pattern file in GNU Linux (AWK/SED/GREP) grep 和 sed 中的等效正则表达式在 awk 中使用非运算符 (!) - Equivalent regex in grep and sed of using not operator (!) in awk 匹配文件中的字符串并使用sed或awk打印整个值 - Match a string in a file and print entire value using sed or awk sed / grep / awk? :将匹配模式附加到行尾 - sed / grep / awk? : append matching pattern to end of line
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM