简体   繁体   中英

How to grep only the desired position match in a single line, where there is multiple matches, using regex?

I have a file with hundreds of links of the form: https://file1.mp4" target='_blank'>HD-MQ</a> | <a href="https://file1_v2.mkv

And, sometimes, the end of the line has mp4 instead of mkv , like below: https://file1.mp4" target='_blank'>HD-MQ</a> | <a href="https://file1_v2.mp4

I already tried 'http.+mp4' pattern to get a single url, or with mkv at the end, but it keeps printing that whole line, because '.+' will do just that, return the phrases that start with http and ends with mp4 .

How could specify the regex (using grep) to match only one of the urls, without that html garbage in the middle?

The final result needs to be https://file1.mp4 or https://file1_v2.mkv , with me specifying which one I want.

You could exclude the double quote in your pattern:

grep -o 'https:\/\/[^"]*\.mp4' file
grep -o 'https:\/\/[^"]*\.mkv' file

or both types

grep -E -o 'https:\/\/[^"]*\.(mp4|mkv)' file

You can use -o or --only-matching option in your grep to show only the matching regex.

Then your regex could be like:

grep -o 'https:\/\/[a-zA-Z0-9_.]*'

This is not the best regex pattern if you have different text that was shown.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM