简体   繁体   English

如何从具有html链接和其他文本的巨大字符串中解析html链接(Java)

[英]How to parse out html links from a huge string with html links and other text (Java)

my question is how would i be able to go through a string and take out only the links and erase all the rest? 我的问题是我如何才能通过字符串并仅取出链接并清除所有其余部分? I thought about using some type of delimiter, but wouldn't know how to go about using it in Java. 我考虑过使用某种类型的定界符,但不知道如何在Java中使用它。 an example of what i am trying to do: 我正在尝试做的一个例子:

this is my String: 这是我的字符串:

String myString = "The file is http: // www.   .com/hello.txt and the second file is "
                     + "http: // www.   .com/hello2.dat";

I would want the output to be: 我希望输出为:

"http: // www.   .com/hello.txt http: // www.   .com/hello2.dat"

or each could be added to an array, separately. 或者每个都可以分别添加到数组中。 I just want some ideas, id like to write the code myself but am having trouble on how to do it. 我只想要一些想法,我喜欢自己编写代码,但是在执行时遇到了麻烦。 Any help would be awesome. 任何帮助都是极好的。

You definitely want to use a regular expression. 您肯定要使用正则表达式。 You'll need to find a good one for matching URLs, and see Java's Pattern and Matcher classes 您需要找到一个很好的网址匹配项,并查看Java的PatternMatcher

Regular Expressions, or Regex, is built for this kind of work. 正则表达式或正则表达式就是为这种工作而构建的。 It is like another mini-language to learn. 就像是另一种学习的迷你语言。 The best book out there some would say is Mastering Regular Expressions 有人会说的最好的书是《 掌握正则表达式》

The Javadoc for Pattern and Matcher can only serve as a reference. Pattern和Matcher的Javadoc只能作为参考。 It completely ignores the subtleties involved in regex. 它完全忽略了正则表达式所涉及的微妙之处。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM