简体   繁体   中英

Java Safelist Add <head> Tag to Allowed List

I want to create a whitelist to remove all html tags except head , body and i in a data. To do that I used Safelist class and jsoup library.

Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] { "head", "body", "i"});
String data = "<head>Title here</head>
               <body>
                  <p><b> paragraph 1</b></p>
                  <p><i> paragraph 2</i></p>
               </body>";
String cleaned_data = Jsoup.clean(data,safe_list); 
System.out.println(cleaned_data);

The expected result was

<head>
 Title here
</head>
<body>
 paragraph 1 <i>paragraph 2</i>
</body>

but the result I got

<body>
 Title here paragraph 1 <i>paragraph 2</i>
</body>

Although head tag in the allowed list, it is removed from the data unlike body and i tag. What is the problem with head tag and what should I do to keep it in a data?

I found a solution. It may not be exact solution but it works in my case. The Jsoup official website has the following information:

The cleaner and these safelists assume that you want to clean a body fragment of HTML (to add user supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the document HTML around the cleaned body HTML, or create a safelist that allows html and head elements as appropriate.

Because creating a safelist that allows html and head elements as appropriate doesn't work, I took the first suggestion:

Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] {"body", "i"});
String data = "<body>
                  <p><b> paragraph 1</b></p>
                  <p><i> paragraph 2</i></p>
               </body>";
String cleaned_data = Jsoup.clean(data,safe_list); 
cleaned_data  = '<head>Title here</head>' + cleaned_data 
System.out.println(cleaned_data);

https://jsoup.org/apidocs/org/jsoup/safety/Safelist.html

Because the true structure for HTML file is:

<html>
 <head>
   <title>Page Title</title>
 </head>
 <body>
 </body>
</html>

then your code should be written in this way:

Safelist safe_list = Safelist.none();
safe_list.addTags(new String[] { "head", "body", "i"});
String data = "<head><title>Title here</title></head>
               <body>
                  <p><b> paragraph 1</b></p>
                  <p><i> paragraph 2</i></p>
               </body>";
String cleaned_data = Jsoup.clean(data,safe_list); 
System.out.println(cleaned_data)

when you just use <head> title hear</head> then Jsoup think that the text between tag is "textNode".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM