简体   繁体   中英

Java regex find variable name outside a string

I want to find all occurrences of a variable name in a file, let's say variable test :

 int test;

but i don't want to match the variable name when it's inside a string, like

String s = "This is a test!";

I tried ([^\\"])([a-zA-Z_$][\\\\w$]*)([^\\"]) , but it won't work.

I'm afraid Regular Expressions are not the best fit for your problem. Since there are a lot of semantics to consider when parsing source code, it is very unlikely that you can come up with a reliable expression, that doesn't get confused by things like escaped quotes within strings.

A better way to parse source code (and reliably detect things like variable names) is to use a generated parser, that knows about the grammar of the file to parse. SableCC is designed for this and it also conveniently provides a grammar file for Java 1.5.

It will basically tokenize the given source code and add type information to each token. This way you can simply iterate over all tokens and rebuild the source while replacing every token that matches your search term and is of type variable.

As I said in the comment, generally using regex for this is not a good idea. You should use some kind of parer for this.

But anyway here is a simple hack that will work for some cases:

(?xm) \b test \b
(?=
    (?:[^\n"\\]+|\\.)*
    (?:(?:"(?:[^\n"\\]+|\\.)*){2})*
    $
)

Java quoted:

"(?m)\\btest\\b(?=(?:[^\n"\\\\]+|\\\\.)*(?:(?:"(?:[^\n"\\\\]+|\\\\.)*){2})*$)"

Some comments and other things will break it.

Maybe it is an idea to temporarily cut all string out of the source code and then search for the variable name.

Assuming the source code is valid (no syntax errors), you can cut everything from the first occuring double quote (") to the next double quote.

Notice that variable names with just one character (like d ) will require some additional code, for d is also used for forcing the compiler as interpreting the preceding number as a double (eg double dbl = 6d ).

EDIT: I was assuming that you wanted to build an application or piece of code which lightweight-checked for variable names.
If you work inside an editor, I recommend you to use an advanced editor like Netbeans or Eclipse.
Otherwise, if you want to also check for correct syntax, you'll need to build your own interpreter (or download some from internet).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM