I have a file with hundreds of links of the form: https://file1.mp4" target='_blank'>HD-MQ</a> | <a href="https://file1_v2.mkv
And, sometimes, the end of the line has mp4
instead of mkv
, like below: https://file1.mp4" target='_blank'>HD-MQ</a> | <a href="https://file1_v2.mp4
I already tried 'http.+mp4'
pattern to get a single url, or with mkv
at the end, but it keeps printing that whole line, because '.+' will do just that, return the phrases that start with http
and ends with mp4
.
How could specify the regex (using grep) to match only one of the urls, without that html garbage in the middle?
The final result needs to be https://file1.mp4
or https://file1_v2.mkv
, with me specifying which one I want.
You could exclude the double quote in your pattern:
grep -o 'https:\/\/[^"]*\.mp4' file
grep -o 'https:\/\/[^"]*\.mkv' file
or both types
grep -E -o 'https:\/\/[^"]*\.(mp4|mkv)' file
You can use -o
or --only-matching
option in your grep to show only the matching regex.
Then your regex could be like:
grep -o 'https:\/\/[a-zA-Z0-9_.]*'
This is not the best regex pattern if you have different text that was shown.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.