简体   繁体   中英

Java Find and Replace String using Regex

I need to specify the string find in Regex format, in order that head tag can be found whatever its format is like <html > or <html> or < html> . How to specify the find string in Regex format?

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
String find = "<html>";
String replace = "";        
Pattern pattern = Pattern.compile(find);        
Matcher matcher = pattern.matcher(source);        
String output = matcher.replaceAll(replace); 
System.out.println("Source = " + source);
System.out.println("Output = " + output);

Although you could go round your problem by doing <\\\\s*html\\\\s*> , you should not process HTML with regex. Obligatory link .

The \\\\s* denotes 0 or more white spaces.

Do not attempt to parse HTML using regex! Try reading about XPath . Very helpful. Although XPath will try by default to validate your document, but you can try HtmlCleaner to make it valid.

To extract text inside your tags use something like

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
System.out.println( source.replaceAll( "^<\\s*html\\s*>(.*)<\\s*\\/html\\s*>$", "$1" ) );
// output is:
// The quick brown fox jumps over the brown lazy dog.

But try to avoid parsing of html by regexps. Read this topic .

This example may be helpful to you.

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";

        String find = "\\<.*?>";
        String replace = "";        
        Pattern pattern = Pattern.compile(find);        
        Matcher matcher = pattern.matcher(source);        
        String output = matcher.replaceAll(replace); 
        System.out.println("Source = " + source);
        System.out.println("Output = " + output);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM