Find Oracle single line comments except the ones that appear inside a string.
For example:
-- This is a valid single line comment
But
'This is a string -- and it is not a comment';
I am using this regex to find single line comments
--.*$
a few cases can be handled but there are several complex ones. You can use this script for reference
-- this is a single line comment
CREATE OR REPLACE PROCEDURE "MAIL_WITH_ATTACHMENT" ( )
IS
tmp varchar(2) ; -- this is a comment
tmp1 varchar(2) := 'some texxt'; -- this is another comment
tmp2 varchar(3) := 'some more --text'; -- this is one more comment
tmp3 varchar(4) := 'this regex isn't --working properly'; -- Don't you think this is another comment
BEGIN
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
mesg:= crlf ||
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
END;
Result must be this
[1] : -- this is a single line comment
[2] : -- this is a comment
[3] : -- this is another comment
[4] : -- this is one more comment
[5] : -- Don't you think this is another comment
Thanks
Personally, I'd use an SQL parser to strip these comments. The problem with regex is that it's not really aware of its surroundings: regex has a hard time figuring out if a single quote is inside a comment, or if --
is inside a string literal.
You can circumvent this by using a regex that matches from the start of a line and match string literals as well. Making it behave more like a lexical analyzer (the first stage of parsing).
Such a regex could look like this:
(?m)^((?:(?!--|').|'(?:''|[^'])*')*)--.*$
A quick break down of the regex:
(?m) # enable multi-line mode
^ # match the start of the line
( # start match group 1
(?: # start non-capturing group 1
(?!--|'). # if there's no '--' or single quote ahead, match any char (except a line break)
| # OR
'(?:''|[^'])*' # match a string literal
)* # end non-capturing group 1 and repeat it zero or more times
) # end match group 1
--.*$ # match a comment all the way to the end of the line
In plain English that would read like: from each start of a line, try to match zero or more:
'(?:''|[^'])*'
); -
that is a part of a comment ( (?!--|').
). and store this match in group 1. Then match a comment ( --.*$
).
So now all you need to do is replace this pattern with whatever is matched in group 1. A demo:
String sql = "-- this is a single line comment\n" +
"\n" +
"CREATE OR REPLACE PROCEDURE \"MAIL_WITH_ATTACHMENT\" ( ) \n" +
"IS \n" +
"tmp varchar(2) ; -- this is a comment \n" +
"tmp1 varchar(2) := 'some texxt'; -- this is another comment\n" +
"tmp2 varchar(3) := 'some more --text'; -- this is one more comment\n" +
"tmp3 varchar(4) := 'this regex isn''t --working properly'; -- Don't you think this is another comment\n" +
"BEGIN\n" +
"\n" +
" '--This is a Mime message, which your current mail reader may not' || crlf ||\n" +
" ' some more -- characters in a string';\n" +
"\n" +
" mesg:= crlf ||\n" +
" '--This is a Mime message, which your current mail reader may not' || crlf ||\n" +
" ' some more -- characters in a string';\n" +
"END; ";
String stripped = sql.replaceAll("(?m)^((?:(?!--|').|'(?:''|[^'])*')*)--.*$", "$1[REMOVED COMMENT]");
System.out.println(stripped);
which will print:
[REMOVED COMMENT]
CREATE OR REPLACE PROCEDURE "MAIL_WITH_ATTACHMENT" ( )
IS
tmp varchar(2) ; [REMOVED COMMENT]
tmp1 varchar(2) := 'some texxt'; [REMOVED COMMENT]
tmp2 varchar(3) := 'some more --text'; [REMOVED COMMENT]
tmp3 varchar(4) := 'this regex isn''t --working properly'; [REMOVED COMMENT]
BEGIN
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
mesg:= crlf ||
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
END;
And if you only want to extract the comments, wrap the capture group around --.*$
and use a Pattern
& Matcher
to find()
the matches:
Matcher m = Pattern.compile("(?m)^(?:(?!--|').|'(?:''|[^'])*')*(--.*)$").matcher(sql);
while(m.find()) {
System.out.println(m.group(1));
}
which will print:
-- this is a single line comment
-- this is a comment
-- this is another comment
-- this is one more comment
-- Don't you think this is another comment
This should help. If you read line by line;
str = str.replaceAll("'{1}.*'{1}", "").replaceFirst(".*--", "--");
Input: -sd '--asdsa ---asdsadasdsad' || ' asdsad' || 'asdsadasd '--here x something
Output: --here x something
Edit: Final version after 3 edit:)
This regex should work fine:
Pattern p = Pattern.compile("^[^']*('[^']*'[^']*)*(--.*)$");
except for the case [5]. But before starting to overcomplicate the regex, are you sure that Oracle doesn't complain about that string?
EDIT
This is the code I've used to test the regex
String[] text =
{
"-- this is a single line comment",
"",
"CREATE OR REPLACE PROCEDURE \"MAIL_WITH_ATTACHMENT\" ( ) ",
"IS ",
"tmp varchar(2) ; -- this is a comment ",
"tmp1 varchar(2) := 'some texxt'; -- this is another comment",
"tmp2 varchar(3) := 'some more --text'; 'blah --blah' -- this is one more comment",
"tmp3 varchar(4) := 'this regex isn't --working properly'; -- Don't you think this is another comment",
"BEGIN",
"",
" '--This is a Mime message, which your current mail reader may not' || crlf ||",
" ' some more -- characters in a string';",
"",
" mesg:= crlf ||",
" '--This is a Mime message, which your current mail reader may not' || crlf ||",
" ' some more -- characters in a string';", "END; ", };
Pattern p = Pattern.compile("^[^']*('[^']*'[^']*)*(--.*)$");
Matcher m = p.matcher("");
for (String s : text) {
m.reset(s);
if (m.find()) {
System.out.println(m.group(m.groupCount()));
}
}
And here's the output:
-- this is a single line comment
-- this is a comment
-- this is another comment
-- this is one more comment
--working properly'; -- Don't you think this is another comment
As you can see, the last line of the output is "wrong". But, as you said, Oracle doesn't like such a string either. Once you correct isn't
into isn''t
, also the outoput will be correct.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.