简体   繁体   English

在Java中使用正则表达式删除匹配的字符串

[英]Remove matching String using regular Expression in java

This is my code please check. 这是我的代码,请检查。 At the End i want to remove list-style-image: url(images/dot.gif); 最后,我要删除列表样式图像:url(images / dot.gif); from the String 从字符串

String temp = "font-family: Arial, Helvetica, sans-serif;font-size: 11px;color: F143F;list-style-image: url(images/dot.gif);list-style-type: none;"; 

Pattern pxPattern = Pattern.compile("([a-z]+-)+([a-z]+):(\\s)url\\(.*?\\);");

Matcher pxMatcher = pxPattern.matcher(temp);

while(pxMatcher.find()) {
    System.out.println(pxMatcher.group());
    String urlString =pxMatcher.group();
    if(!urlString.matches("http://|https://")) {
        System.out.println("Firts: "+temp.trim());
        System.out.println(urlString);
        System.out.println(temp.replaceAll(urlString, ""));
        //System.out.println("Remove: "+temp);
    }
}

This is a general answer to the question title; 这是对问题标题的一般回答; it may not directly address the specifics of the question. 它可能无法直接解决问题的细节。 Let's say we have a string called PATTERN and a string called body. 假设我们有一个名为PATTERN的字符串和一个名为body的字符串。 Then we can remove all matches of PATTERN from body as follows: 然后,我们可以从主体中删除所有PATTERN匹配项,如下所示:

StringBuilder builder = new StringBuilder();
int x = 0;
Matcher m = Pattern.compile(PATTERN).matcher(body);
while (m.find()) {
  builder.append(body.substring(x, m.start()));
  x = m.end();
}
return(builder.toString());

Eg if PATTERN = "XOX" and body = "Hello XOXWorldXOX" then we should get back "Hello World". 例如,如果PATTERN =“ XOX”和body =“ Hello XOXWorldXOX”,那么我们应该返回“ Hello World”。

How it works: iterate through each match, recording the index in the string just after the last match, and adding the substring from that index to the start of the current match to a string builder, then skipping the index forward over the current match to the end. 工作原理:遍历每个匹配项,在最后一个匹配项之后记录字符串中的索引,然后将该索引中的子字符串添加到当前匹配项的开头到字符串生成器中,然后向前跳过当前匹配项以结束。 Finally, build the string. 最后,构建字符串。

Note: The answer of beny23 is better for removal of a regex from a string. 注意: beny23的答案对于从字符串中删除正则表达式更好。 However, with a small tweak, the above code can be made more general. 但是,只需稍作调整,就可以使上面的代码更通用。 It can be changed to replace each subsequent occurrence of the regex with a unique replacement string. 可以更改它,以使用唯一的替换字符串替换以后每次出现的正则表达式。 This is more powerful and general than replaceAll, but it's an odd corner case that probably doesn't crop up that often. 它比replaceAll更强大,更通用,但这是一个奇怪的情况,可能不会经常出现。 Still, to show you what I mean, suppose instead of removing each regex match, we want to replace the first match with "match_1" and the second with "match_2" and so on, we can do this: 不过,为了向您展示我的意思,假设我们不想删除每个正则表达式匹配项,而是希望将第一个匹配项替换为“ match_1”,将第二个匹配项替换为“ match_2”,依此类推,我们可以这样做:

StringBuilder builder = new StringBuilder();
int x = 0;
int matchNumber = 1;
Matcher m = Pattern.compile(PATTERN).matcher(body);
while (m.find()) {
  builder.append(body.substring(x, m.start()));
  builder.append("match_" + matchNumber);
  x = m.end();
}
return(builder.toString());

Eg if PATTERN = "XOX" and body = "Hello XOXWorldXOX" then we should get back "Hello match_1Worldmatch_2". 例如,如果PATTERN =“ XOX”和body =“ Hello XOXWorldXOX”,那么我们应该返回“ Hello match_1Worldmatch_2”。

With a little more tweaking, we could generalise the above to replace each subsequent match with an array element, making it truly general. 稍作调整,我们就可以概括上述内容,以数组元素替换每个后续匹配项,从而使其真正通用。

It works for me fine 对我来说很好

while(pxMatcher.find()) {
    System.out.println(pxMatcher.group());
    String urlString =pxMatcher.group();
    if(!urlString.matches("http://|https://")) {
        System.out.println("Firts: "+temp.trim());
        System.out.println(urlString);
        temp = temp.replace(urlString, "");
        System.out.println("Remove: "+temp);
     }
}

Result is 结果是

list-style-image: url(images/dot.gif);
Firts: font-family: Arial, Helvetica, sans-serif;font-size: 11px;color: F143F;list-style-image: url(images/dot.gif);list-style-type: none;
list-style-image: url(images/dot.gif);
Remove: font-family: Arial, Helvetica, sans-serif;font-size: 11px;color: F143F;list-style-type: none;

I would remove the list-style-image as follows (rather than using a while loop, this can be done in one line): 我将删除list-style-image ,如下所示(而不是使用while循环,这可以在一行中完成):

temp.replaceAll("list-style-image:[^;]+;?", "");

To explain: 解释:

  • This will look for list-style-image , 这将寻找list-style-image
  • then one or more characters which aren't a semicolon 然后是一个或多个不是分号的字符
  • then an optional semicolon 然后是可选的分号

This will remove the list-style-image attribute from the middle as well as the end of your string. 这将从字符串的中间和结尾删除list-style-image属性。

Result: 结果:

font-family: Arial, Helvetica, sans-serif;font-size: 11px;color: F143F;list-style-type: none; 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM