How to grep only the desired position match in a single line, where there is multiple matches, using regex?

Question

I have a file with hundreds of links of the form: https://file1.mp4" target='_blank'>HD-MQ</a> | <a href="https://file1_v2.mkv

And, sometimes, the end of the line has mp4 instead of mkv , like below: https://file1.mp4" target='_blank'>HD-MQ</a> | <a href="https://file1_v2.mp4

I already tried 'http.+mp4' pattern to get a single url, or with mkv at the end, but it keeps printing that whole line, because '.+' will do just that, return the phrases that start with http and ends with mp4 .

How could specify the regex (using grep) to match only one of the urls, without that html garbage in the middle?

The final result needs to be https://file1.mp4 or https://file1_v2.mkv , with me specifying which one I want.

Answer 1

You could exclude the double quote in your pattern:

grep -o 'https:\/\/[^"]*\.mp4' file
grep -o 'https:\/\/[^"]*\.mkv' file

or both types

grep -E -o 'https:\/\/[^"]*\.(mp4|mkv)' file

Answer 2

You can use -o or --only-matching option in your grep to show only the matching regex.

Then your regex could be like:

grep -o 'https:\/\/[a-zA-Z0-9_.]*'

This is not the best regex pattern if you have different text that was shown.

How to grep only the desired position match in a single line, where there is multiple matches, using regex?

Question

2 answers

solution1
1 2019-12-16 00:10:50

solution2
0 2019-12-15 23:03:36

How to grep only the desired position match in a single line, where there is multiple matches, using regex?

Question

2 answers

solution1 1 2019-12-16 00:10:50

solution2 0 2019-12-15 23:03:36

solution1
1 2019-12-16 00:10:50

solution2
0 2019-12-15 23:03:36