Manipulating strings in bash

Question

I have a file that contains a page of google which I got after a search. I used

w3m -no-cookie $search > google

to make the page

after that I need to get all the sites contained in that page, so basically all the strings that start with "www" and end with "/"

I tried :

grep -Fw "www" google | awk -F "/" '{ print $1";" }'

but it gives me everything that is on the line before www

how do I remove that?

should I use sed?

thanks!

Answer 1

Assuming that all sites start with www is a bit weird, but here it is:

Your problem is that grep will return the whole line. With -o it will only return the matched part:

grep -wo "www.*" google | awk -F "/" '{ print $1";" }'

or simply:

grep -wo "www[^/]*" google