I've a file which contains self closing anchor tags
<p><a name="impact"/><span class="sectiontitle">Impact</span></p>
<p><a name="Summary"/><span class="sectiontitle">Summary</span></p>
i want to correct the tags like below
<p><a name="impact"><span class="sectiontitle">Impact</span></a></p>
<p><a name="Summary"><span class="sectiontitle">Summary</span></a></p>
I've written this code to find and replace incorrect anchor tags
package mypack;
import java.io.*;
import java.util.regex.*;
public class AnchorIssue {
static int count=0;
public static void main(String[] args) throws IOException {
Pattern pFinder = Pattern.compile("<a name=\\\".*\\\"(\\/)>(.*)(<)");
BufferedReader r = new BufferedReader
(new FileReader("D:/file.txt"));
String line;
while ((line =r.readLine()) != null) {
Matcher m1= pFinder.matcher(line);
while (m1.find()) {
int start = m1.start(0);
int end = m1.end(0);
++count;
// Use CharacterIterator.substring(offset, end);
String actual=line.substring(start, end);
System.out.println(count+"."+"Actual String :-"+actual);
actual.replace(m1.group(1),"");
System.out.println(actual);
actual.replaceAll(m1.group(3),"</a><");
System.out.println(actual);
// Use CharacterIterator.substring(offset, end);
System.out.println(count+"."+"Replaced"+actual);
}
}
r.close();
}
}
The above code returns the correct number of self-closing anchor tags in file but the replace code is not working properly.
Your problem is greediness. Ie the .*"
will match everything up to the last "
in that line. There are two fixes for this. Both fixes are about to replace this line:
Pattern pFinder = Pattern.compile("<a name=\\\".*\\\"(\\/)>(.*)(<)");
Option one: use a negated character class:
Pattern pFinder = Pattern.compile("<a name=\\\"[^\\"]*\\\"(\\/)>(.*)(<)");
Option two: use lazy repetitor:
Pattern pFinder = Pattern.compile("<a name=\\\".*?\\\"(\\/)>(.*)(<)");
See more here .
Since the file structure seems "constant", it might be better to simplify the problem to a matter of simple replaces as opposed to complex html matching. It seems to me that you're not really interested in the content of the anchor tag, so just replace /><span
with ><span
and </span></p>
with </span></a></p>
.
Using below code i'm able to find and replace all self closed anchor tags.
package mypack;
import java.io.*;
import java.util.regex.*;
public class AnchorIssue {
static int count=0;
public static void main(String[] args) throws IOException {
Pattern pFinder = Pattern.compile("<a name=\\\".*?\\\"(\\/><span)(.*)(<\\/span>)");
BufferedReader r = new BufferedReader
(new FileReader("file.txt"));
String line;
while ((line =r.readLine()) != null) {
Matcher m1= pFinder.matcher(line);
while (m1.find()) {
int start = m1.start(0);
int end = m1.end(0);
++count;
// Use CharacterIterator.substring(offset, end);
String actual=line.substring(start, end);
System.out.println(count+"."+"Actual String : "+actual);
actual= actual.replaceAll(m1.group(1),"><span");
System.out.println("\n");
actual= actual.replaceAll(m1.group(3),"</span></a>");
System.out.println(count+"."+"Replaced : "+actual);
System.out.println("\n");
System.out.println("---------------------------------------------------");
}
}
r.close();
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.