How do I extract URL's from a file? My file name is URL_name.txt
This file has a lot of url inside. It looks like this:
<pre>
<pre><div></pre><something>something here<href="http://www.google.com/">something here</font>
<font><href="http://www.stackoverflow.com/">something</td>
..
..
..
</pre>
Here is my idea, I want to remove everything before URLs then I can remove everything after URL. How do I use sed command deal with it? The output should be
http://www.google.com/
http://www.stackoverflow.com/
使用tr
和grep
:
tr '"' '\n' < URL_name.txt | grep http
It is possible using java. as well as you can also try below commands:
egrep -ie "<*HREF=(.*?)>" index.html | cut -d "\\"" -f 2 | grep ://
egrep -ie "<*HREF=(.*?)>" index.html | awk -F\\" '{print $2}' | grep ://
您可以使用grep
:
grep -o 'http://[^"]*' yourfile
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.