正则表达式模式以匹配某些URL

Question

I have a large text and I only want to use certain information from it. 我有一个很大的文本，我只想使用其中的某些信息。 The text looks like this: 文本如下所示：

Some random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_1_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_2_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_3_av.m3u8

I only want the http text. 我只想要http文字。 There are several of them in the text but I only need one of them. 文本中有几个，但我只需要其中之一。 The regular expression should be "starts with http and ends with .m3u8". 正则表达式应为“以http开头，以.m3u8结尾”。

I looked at the glossary of all the different expression but it is very confusing to me. 我查看了所有不同表达方式的词汇表，但这对我来说很混乱。 I tried "/^(https?:\\/\\/)?([\\da-z\\.-]+)\\.([az\\.]{12,30})([\\/\\w \\.-]*)*\\/?$/" as my pattern. 我尝试了"/^(https?:\\/\\/)?([\\da-z\\.-]+)\\.([az\\.]{12,30})([\\/\\w \\.-]*)*\\/?$/"作为我的模式。 But is that enough? 但是够了吗？

All help is appreciated. 感谢所有帮助。 Thank you. 谢谢。

Answer 1

Assuming your text is line-separated at every line representation in your example, here's a snippet that will work: 假设您的示例中的每个行表示中的文本都是行分隔的，那么以下代码片段将起作用：

String text = 
"Some random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
"More random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
// removed some for brevity
"More random text here" +
System.getProperty("line.separator") +
// added counter-example ending with "NOPE"
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.NOPE";

// Multi-line pattern:
//                           ┌ line starts with http
//                           |    ┌ any 1+ character reluctantly quantified
//                           |    |  ┌ dot escape
//                           |    |  |  ┌ ending text
//                           |    |  |  |   ┌ end of line marker
//                           |    |  |  |   |
Pattern p = Pattern.compile("^http.+?\\.m3u8$", Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println(m.group());
}

Output 产量

http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8

Edit 编辑

For a refined "filter" by the "index_x" file of the URL, you can simply add it in the Pattern between the protocol and ending of the line, eg: 对于URL的"index_x"文件的改进的“过滤器”，您只需将其添加到协议和行尾之间的Pattern ，例如：

Pattern.compile("^http.+?index_0.+?\\.m3u8$", Pattern.MULTILINE);

Answer 2

我没有测试它，但这应该可以解决问题：

^(http:\/\/.*\.m3u8)

Answer 3

It is the answer of @capnibishop, but with a little change. 这是@capnibishop的答案，但有一点变化。

^(http://).*(/index_1)[^/]*\.m3u8$

Added the missing "$" sign at the end. 在末尾添加了丢失的“ $”符号。 This ensures it matches 这确保它匹配

http://something.m3u8

and not 并不是

http://something.m3u81

Added the condition to match index_1 at the end of the line, which means it wil match: 在行尾添加了条件来匹配index_1 ，这意味着它将匹配：

http://something/index_1_something_else.m3u8

and not 并不是

http://something/index_1/something_else.m3u8

正则表达式模式以匹配某些URL

问题描述

3 个解决方案

解决方案1
1 2015-04-27 12:30:55

解决方案2
0 2015-04-27 12:24:46

解决方案3
0 2015-04-27 12:41:09

正则表达式模式以匹配某些URL

问题描述

3 个解决方案

解决方案1 1 2015-04-27 12:30:55

解决方案2 0 2015-04-27 12:24:46

解决方案3 0 2015-04-27 12:41:09

解决方案1
1 2015-04-27 12:30:55

解决方案2
0 2015-04-27 12:24:46

解决方案3
0 2015-04-27 12:41:09