简体   繁体   English


[英]How to replace string using java regex

I've a file which contains self closing anchor tags 我有一个包含自闭合锚标记的文件

  <p><a name="impact"/><span class="sectiontitle">Impact</span></p>
<p><a name="Summary"/><span class="sectiontitle">Summary</span></p>

i want to correct the tags like below 我想纠正下面的标签

    <p><a name="impact"><span class="sectiontitle">Impact</span></a></p>
<p><a name="Summary"><span class="sectiontitle">Summary</span></a></p>

I've written this code to find and replace incorrect anchor tags 我写了这段代码来查找和替换错误的锚标签

   package mypack;
import java.io.*;
import java.util.regex.*;

public class AnchorIssue {

    static int count=0;
    public static void main(String[] args) throws IOException {
        Pattern pFinder = Pattern.compile("<a name=\\\".*\\\"(\\/)>(.*)(<)");
        BufferedReader r = new BufferedReader
                  (new FileReader("D:/file.txt"));
                  String line;
                  while ((line =r.readLine()) != null) {
                     Matcher m1= pFinder.matcher(line);
                     while (m1.find()) {
                        int start = m1.start(0);
                        int end = m1.end(0);

//                  Use CharacterIterator.substring(offset, end);
                        String actual=line.substring(start, end);
                        System.out.println(count+"."+"Actual String :-"+actual);


//              Use CharacterIterator.substring(offset, end);



The above code returns the correct number of self-closing anchor tags in file but the replace code is not working properly. 上面的代码在文件中返回正确数量的自闭合锚标记,但替换代码无法正常工作。

Your problem is greediness. 你的问题是贪婪。 Ie the .*" will match everything up to the last " in that line. .*"将匹配该行中的所有内容" There are two fixes for this. 有两个修复方法。 Both fixes are about to replace this line: 这两个修复程序即将替换此行:

Pattern pFinder = Pattern.compile("<a name=\\\".*\\\"(\\/)>(.*)(<)");

Option one: use a negated character class: 选项一:使用否定的字符类:

Pattern pFinder = Pattern.compile("<a name=\\\"[^\\"]*\\\"(\\/)>(.*)(<)");

Option two: use lazy repetitor: 方案二:使用懒惰重复:

Pattern pFinder = Pattern.compile("<a name=\\\".*?\\\"(\\/)>(.*)(<)");

See more here . 在这里查看更多。

Since the file structure seems "constant", it might be better to simplify the problem to a matter of simple replaces as opposed to complex html matching. 由于文件结构似乎是“常量”,因此将问题简化为简单替换而不是复杂的html匹配可能更好。 It seems to me that you're not really interested in the content of the anchor tag, so just replace /><span with ><span and </span></p> with </span></a></p> . 在我看来,您对锚标记的内容并不感兴趣,所以只需将/><span with ><span</span></p>替换为</span></a></p>

Using below code i'm able to find and replace all self closed anchor tags. 使用下面的代码,我能够找到并替换所有自闭锚标签。

    package mypack;
import java.io.*;
import java.util.regex.*;

public class AnchorIssue {

    static int count=0;
    public static void main(String[] args) throws IOException {
        Pattern pFinder = Pattern.compile("<a name=\\\".*?\\\"(\\/><span)(.*)(<\\/span>)");
        BufferedReader r = new BufferedReader
                  (new FileReader("file.txt"));
                  String line;
                  while ((line =r.readLine()) != null) {
                     Matcher m1= pFinder.matcher(line);
                     while (m1.find()) {
                        int start = m1.start(0);
                        int end = m1.end(0);

//                  Use CharacterIterator.substring(offset, end);
                        String actual=line.substring(start, end);
                        System.out.println(count+"."+"Actual String : "+actual);

                        actual= actual.replaceAll(m1.group(1),"><span");

                        actual= actual.replaceAll(m1.group(3),"</span></a>");

                    System.out.println(count+"."+"Replaced : "+actual);



声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM