简体   繁体   中英

Regex to block all < in a String

I'm trying to create a Regex to block all < and > in a String except when used with <select>. Can anyone suggest a Regex for that? I'll be using it with javax.util.Pattern .

I'm trying to write a solution to block the injection attack and XSS attempts through request and URL. For that, I'll be blocking the special characters and character sequences but with some exceptions. One of the exception is that, I have to allow <select> (angle brackets with select in between them) because that is passed into the request legitimately in some of the cases. But all other combinations of angle brackets have to be blocked. And that is the reason of my question.

这会从字符串中删除<和>字符,除非它们像您提到的那样是<select>的一部分:

someString.replaceAll("<(?!select>)|(?<!\\<select)>", "");
Pattern p = Pattern.compile(
  "(?<!\\<select)>|<(?!\s*select\s*>)",
  Pattern.CASE_INSENSITIVE);

This will find > not preceded by <select and < not followed by select> allowing it to be case-insensitive.

Now normally I'd check for (legal) white-space around the element (" < select > " is valid) but the lookbehind has issues with that that I'm not really sure how to get around.

I suspect it can be done with a single regex but it may be easier to split it into stages, eg:

  1. "@" => "@0"
  2. "<select>" => "@1"
  3. "<" => ""
  4. ">" => ""
  5. "@1" => "<select>"
  6. "@0" => "@"

Note: these are all literal strings not regex patterns. I have arbitrarily chosen "@" as an escape character but it can be anything.

Example: "a <b> c <select> @ d"
step 1
"a <b> c <select> @0 d"
step 2
"a <b> c @1 @0 d"
step 3
"a b> c @1 @0 d"
step 4
"abc @1 @0 d"
step 5
"abc <select> @0 d"
step 6
"abc <select> @ d"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM