First time coding Java here, so bear with me :PI am trying to make a program in Java that opens a html file and edits it so that it removes all its html tags, but only them and not everything else. I am assuming that the file already exists and I don't need to create it. For now i have been working with a .txt file that has html code in it, in order to get me started faster. So far i have managed to edit the file so that it simply removes the html tag and replaces it with nothing. However what i really want is to remove anything that is inside the opening and the closing brackets. I will show an example of what i need:
<html>
<body>
<p> blah blah blah
</p>
</body>
</html>
After my program has been executed, the txt file should have only "blah blah blah" in it. In order to replace the tag , i am using:
if(myString.contains("<html>"))
{
// do stuff
}
So here is my question: is there something like an escape character in java that allows me to say:
if(myString.contains("<") && it is followed by as many characters as the file wants by (">") )
//then remove everything in between them.
For the sake of our minds, lets assume that the html code inside the .txt file has no errors. I will post the code if you want me to, but it is really bad structured and I don't think it will help you guys understand what i am doing at all. That is because i have been trying a lot of things simultaneously and i have kept whatever i may find useful as a comment. Thank you for your time!
You can use String.replaceAll
with a regular expression.
"<html><p>foo bar</p></html>".replaceAll("</?[A-Za-z]+>", "");
Results in:
foo bar
However, be careful to not try to parse the HTML with regular expressions.
使用JSoup,您可以非常简单地剥离HTML页面中的所有标签:
Jsoup.parse(myString).text()
Try to use regular expression like this. Here any string starting with < and ending with > and containing any no of any character inside these two angle brackets will be replaced by empty string so your code will remain bla bla ...
str = str.replaceAll("<.*>", "");
You can test the regex here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.