简体   繁体   中英

Replace all characters in a string that are between two other characters in Java

First time coding Java here, so bear with me :PI am trying to make a program in Java that opens a html file and edits it so that it removes all its html tags, but only them and not everything else. I am assuming that the file already exists and I don't need to create it. For now i have been working with a .txt file that has html code in it, in order to get me started faster. So far i have managed to edit the file so that it simply removes the html tag and replaces it with nothing. However what i really want is to remove anything that is inside the opening and the closing brackets. I will show an example of what i need:

<html>
<body>
<p> blah blah blah 
</p> 
</body> 
</html>

After my program has been executed, the txt file should have only "blah blah blah" in it. In order to replace the tag , i am using:

    if(myString.contains("<html>"))
        {
          // do stuff
         }

So here is my question: is there something like an escape character in java that allows me to say:

if(myString.contains("<") && it is followed by as many characters as the file wants by (">") )
//then remove everything in between them.

For the sake of our minds, lets assume that the html code inside the .txt file has no errors. I will post the code if you want me to, but it is really bad structured and I don't think it will help you guys understand what i am doing at all. That is because i have been trying a lot of things simultaneously and i have kept whatever i may find useful as a comment. Thank you for your time!

You can use String.replaceAll with a regular expression.

"<html><p>foo bar</p></html>".replaceAll("</?[A-Za-z]+>", "");

Results in:

foo bar

However, be careful to not try to parse the HTML with regular expressions.

使用JSoup,您可以非常简单地剥离HTML页面中的所有标签:

Jsoup.parse(myString).text()

Try to use regular expression like this. Here any string starting with < and ending with > and containing any no of any character inside these two angle brackets will be replaced by empty string so your code will remain bla bla ...

str = str.replaceAll("<.*>", "");

You can test the regex here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM