[英]Java regular expression to match valid Java identifiers
我需要創建一個能夠在 Java 代碼中查找並獲取有效標識符的正則表達式,如下所示:
int a, b, c;
float d, e;
a = b = 5;
c = 6;
if ( a > b)
{
c = a - b;
e = d - 2.0;
}
else
{
d = e + 6.0;
b = a + c;
}
我試圖在單個正則表達式中添加多個正則表達式,但如何構建一個模式來排除保留字?
我試過這個正則表達式^(((&&|<=|>=|<|>|,=|==|&|.)|([-+=]{1?2})|([;,.)}{;,(-]))|(else|if|float|int)|(\d[\d.]))
但它沒有按預期工作。
在下圖中,我應該如何匹配標識符?
Java 有效標識符是:
[a-zA-Z]
、下划線_
或美元符號$
_
是自 Java 9 以來的關鍵字驗證前三個條件的簡單正則表達式如下: (\b([A-Za-z_$][$\w]*)\b)
但它不會過濾掉保留字。
要排除保留字,需要否定前瞻(?!)
來指定一組無法匹配的標記: \b(?!(_\b|if|else|for|float|int))([A-Za-z_$][$\w]*)
:
(?!(_\b|if|else|for|float|int))
排除指定單詞的列表([A-Za-z_$][$\w]*)
匹配標識符。 但是,單詞邊框\b
消耗美元符號$
,因此此正則表達式無法匹配以$
開頭的標識。
此外,我們可能希望排除字符串和字符文字(“not_a_variable”、“c”、“\u65”)內部的匹配。
這可以使用正后向(?<=)
來匹配主表達式之前的組來完成,而不將其包含在結果中而不是單詞邊界 class \b
: (?<=[^$\w'"\\])(?!(_\b|if|else|for|float|int))([A-Za-z_$][$\w]*)
接下來,Java個保留字的完整列表如下,可以收集成一個用|
分隔的token字符串 .
下面提供了一個測試 class,它顯示了正則表達式的最終模式及其檢測 Java 標識符的用法。
import java.util.Arrays;
import java.util.List;
import java.util.regex.MatchResult;
import java.util.regex.Pattern;
public class IdFinder {
static final List<String> RESERVED = Arrays.asList(
"abstract", "assert", "boolean", "break", "byte", "case", "catch", "char", "class", "const",
"continue", "default", "double", "do", "else", "enum", "extends", "false", "final", "finally",
"float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long",
"native", "new", "null", "package", "private", "protected", "public", "return", "short", "static",
"strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try",
"void", "volatile", "while", "_\\b"
);
static final String JAVA_KEYWORDS = String.join("|", RESERVED);
static final Pattern VALID_IDENTIFIERS = Pattern.compile(
"(?<=[^$\\w'\"\\\\])(?!(" + JAVA_KEYWORDS + "))([A-Za-z_$][$\\w]*)");
public static void main(String[] args) {
System.out.println("ID pattern:\n" + VALID_IDENTIFIERS.pattern());
String code = "public class Main {\n\tstatic int $1;\n\tprotected char _c0 = '\\u65';\n\tprivate long c1__$$;\n}";
System.out.println("\nIdentifiers in the following code:\n=====\n" + code + "\n=====");
VALID_IDENTIFIERS.matcher(code).results()
.map(MatchResult::group)
.forEach(System.out::println);
}
}
Output
ID pattern:
(?<=[^$\w'"\\])(?!(abstract|assert|boolean|break|byte|case|catch|char|class|const|continue|default|double|do|else|enum|extends|false|final|finally|float|for|goto|if|implements|import|instanceof|int|interface|long|native|new|null|package|private|protected|public|return|short|static|strictfp|super|switch|synchronized|this|throw|throws|transient|true|try|void|volatile|while|_\b))([A-Za-z_$][$\w]*)
Identifiers in the following code:
=====
public class Main {
static int $1;
protected char _c0 = '\u65';
private long c1__$$;
}
=====
Main
$1
_c0
c1__$$
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.