简体   繁体   中英

Java Reguar Expression: How to replace double or more slashes with a single slash but ignoring http:// or https://

The current code to remove multiple slashes is

path = path.replaceAll("/{2,}", "/");

Which turns https://stackoverflow.com to https:/stackoverflow.com and that is not intended.

I did some research and came up with the negative lookbehind to ignore double slashes that has https: before, but it only matches double slashes , not triple slashes or more:

(?<!http\/\/)

I thought if you can negate a 'sub' regular expression, it might be something like this, meaning matches ( double or more slashes ) but not match 2 slashes that has https: ahead.

\/{2,}.negate(https:(?=\/\/))

Is this possible?

You had the right idea with the negative lookbehind, but you shouldn't include the slashes themselves in the lookbehind. You want to match multiple slashes in all cases, but the negative lookbehind says "ignore this if the preceding text is http: . So it would be something like

(?<!http:)/{2,}

to find all the slashes that you want to replace. You may, of course, wish to include other protocols like https: and ftp: with something like this.

(?<!(http:|https:|ftp:))/{2,}

Here is my final solution in java:

String path = "http:///baidu.com///a//b/c";
path = path.replaceFirst("(?=(http:|https:|ftp:))/{3,}", "/{2}");
path = path.replaceAll("(?<!(http:|https:|ftp:))/{2,}", "/");

The second line replaces the first 3 or more slashes behind the protocol with double slashes. I used positive lookbehind (?<=...) .

The third line replaces the rest of double or more slashes and replace them with single slashs. I used negative lookbehind (?<!...)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM