简体   繁体   English

在Java中使用正则表达式查找和替换URL

[英]Finding and replacing an url using regex in java

I am trying to replace a url with regex using String.replace and the code is below 我正在尝试使用String.replace将url替换为正则表达式,并且代码如下

public class Test {
    public static void main(String[] args) {
        String test = "https://google.com";
        //String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
        String regex = "(http?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]"; // does not match <http://google.com>

        String newText = test.replace(regex, "");
        System.out.println(newText);
    }
}

I have looked into several questions on it in SO but it does not replace the pattern. 我已经在SO中研究了几个问题,但是它并不能替代模式。 Can someone please tell me how do i achieve that? 有人可以告诉我如何实现吗?

String.replace() does not accept a regular expression. String.replace()不接受正则表达式。 Use String.replaceAll instead: 使用String.replaceAll代替:

String newText = test.replaceAll(regex, "");

As far as the regex is concerned, you should match the https as well: 就正则表达式而言,您还应该匹配https

String regex = "(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

You cannot use a regex with replace , use replaceAll instead, ie: 您不能使用带有replace的正则表达式,而应使用replaceAll ,即:

   String test = "something https://google.com something";
    try {
        String newText = test.replaceAll("(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]", "");
        System.out.println(newText);
    } catch (PatternSyntaxException ex) {
        // Syntax error in the regular expression
    } catch (IllegalArgumentException ex) {
        // Syntax error in the replacement text (unescaped $ signs?)
    } catch (IndexOutOfBoundsException ex) {
        // Non-existent backreference used the replacement text
    }

Output: 输出:

something  something

Live Demo: 现场演示:

http://ideone.com/Yi2hrb http://ideone.com/Yi2hrb


Regex Explanation: 正则表达式说明:

(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]

Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Default line breaks; Regex syntax only

Match the regex below and capture its match into backreference number 1 «(https?|ftp|file)»
   Match this alternative «https?»
      Match the character string “http” literally «http»
      Match the character “s” literally «s?»
         Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Or match this alternative «ftp»
      Match the character string “ftp” literally «ftp»
   Or match this alternative «file»
      Match the character string “file” literally «file»
Match the character string “://” literally «://»
Match a single character present in the list below «[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   The literal character “-” «-»
   A character in the range between “a” and “z” «a-z»
   A character in the range between “A” and “Z” «A-Z»
   A character in the range between “0” and “9” «0-9»
   A single character from the list “+&@#/%?=~_|!:,.;” «+&@#/%?=~_|!:,.;»
Match a single character present in the list below «[-a-zA-Z0-9+&@#/%=~_|]»
   The literal character “-” «-»
   A character in the range between “a” and “z” «a-z»
   A character in the range between “A” and “Z” «A-Z»
   A character in the range between “0” and “9” «0-9»
   A single character from the list “+&@#/%=~_|” «+&@#/%=~_|»

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM