简体   繁体   中英

grep complete resource url within a file

I have to search and extract within a file addresses like these:

http://deimos.apple.com/WebObjects/Core.woa/DownloadRedirectedTrackPreview/unina.it-dz.5373092572.05373092574.12739786322/enclosure.m4v

They are 38 links with only the last serie of digit which change.

I tried with this regexp:

grep -io 'http://ex[a-z.-]*/[a-z0-9+-]*/[a-z0-9.,-+]*[.m4v]'

it extract all the urls present in the file which point to an m4v file but not the complete url it get a partial url as follow:

http://deimos.apple.com/WebObjects/Core.woa/DownloadRedirectedTrackPreview/unina.

Where am I wrong?

I can't figure out why it happens.

Thanks a lot for your effort.

Your regex and your extracted filename do not match. The filename that you list does not begin with:

http://ex

Which your regex requires. you could change your regex to something more like this which would match your URL:

'http://(?:[a-z0-9+-]+/)*[a-z0-9+-]+\.m4v'

Sorry Jonathan it was a typing mistake while I posted in my regex was correctly used dei and not ex as written. But the problem persisted. Marc opened my mind. I knew how the address starts so I tried with grep -io ' http://dei / .m4v' no success :-( fedorqui gave the last hint, maybe the problem was a dot so I tried grep -io ' http://deimos . / .m4v' :-D and it did the trick!

Now I have the file to give to wget to automate multiple file downloads without proprietary softwares needing.

The files are podcasts of juridic lessons released free as in freedom but only in an easy way for who would buy Apple or Microsoft (iTunes). Now I have the file to give to wget to automate multiples file downloads without soiling my system with emulators and proprietary software.

Thanks to all indeed!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM