[英]Java Regular Expressions for escaping characters inside quotes
I have a document. 我有文件
All the arithmetic characters ( +
, -
, *
, /
) in the document; 文档中的所有算术字符(
+
, -
, *
, /
); I want to replace with their name ( add
, sub
, mult
, div
) except if these characters occur inside double quotes. 我要替换为它们的名称(
add
, sub
, mult
, div
),除非这些字符出现在双引号内。
For example: 例如:
a + b;
"a + b";
OUTPUT: OUTPUT:
a add b;
"a + b";
You can think of the document as a C program where I want to take arithmetic operations and convert them to their meaning ( add
, sub
, ...) but I don't want to process arithmetic operation if it is inside double quotes. 您可以将文档视为一个C程序,我想在其中进行算术运算并将其转换为它们的含义(
add
, sub
,...),但是如果它在双引号内,则我不希望对其进行处理。
How can I capture this using Java regular expressions? 如何使用Java正则表达式捕获它?
The following regex ( try it on regex101 ) 以下正则表达式( 在regex101上尝试 )
[^\"].*(\+|\-|\*|\/).*[^\"]\;
matches: 火柴:
[^\\"]
- Anything that's not "
[^\\"]
-任何非"
.*
- followed by anything .*
-后跟任何内容
(\\+|\\-|\\*|\\/)
- The capture group. (\\+|\\-|\\*|\\/)
-捕获组。 Captures a +
, -
, *
, or /
捕获
+
, -
, *
或/
.*
- followed by anything .*
-后跟任何内容
[^\\"]
- Anything that's not "
[^\\"]
-任何非"
Since you're using this in Java, you'll have to escape all of the forward slashes again. 由于您是在Java中使用它,因此必须再次转义所有正斜杠。
Java compatible REGEX: Java兼容的REGEX:
"[^\\\"].*(\\+|\\-|\\*|\\/).*[^\\\"]\\;"
You can use String replaceAll method 您可以使用String replaceAll方法
Example: 例:
public class MyTest {
/**
* @param args
*/
public static void main(String[] args) {
String operation = "\"a + b / c\"";
String result = operation.replaceAll("\\+", "add").replaceAll("\\/", "div");
System.out.println(result);
}
}
Will output: "a add b div c"
将输出:“ a add b div c”
Granted this kind of looks like homework, but to me this looks more like a Regex question than a java question. 授予了这种看起来像家庭作业的功能,但对我来说,这看起来更像是Regex问题而不是Java问题。
Here's an example of how you can achieve the output requested in the original question: 这是一个示例,说明如何实现原始问题中要求的输出:
$ cat TestRegex.java
public class TestRegex {
public static void main(final String[] args) {
String inputString = "c = a + b; cout << \" a + b \";";
System.out.println("inputString: '" + String.valueOf(inputString) + "'.");
System.out.println("replace (+, add) ex: '" + String.valueOf(inputString.replaceAll("(\\+)(?=(?:[^\"]|\"[^\"]*\")*$)", "add")) + "'.");
System.out.println("replacedAll: '" +
String.valueOf(
inputString
.replaceAll("(\\+)(?=(?:[^\"]|\"[^\"]*\")*$)", "add")
.replaceAll("(\\-)(?=(?:[^\"]|\"[^\"]*\")*$)", "sub")
.replaceAll("(\\*)(?=(?:[^\"]|\"[^\"]*\")*$)", "mult")
.replaceAll("(\\/)(?=(?:[^\"]|\"[^\"]*\")*$)", "div")
) + "'."
);
}
}
With sample output: 带有示例输出:
$ java TestRegex
inputString: 'c = a + b; cout << " a + b ";'.
replace (+, add) ex: 'c = a add b; cout << " a + b ";'.
replacedAll: 'c = a add b; cout << " a + b ";'.
Modifying the input string further and testing again, I get: 进一步修改输入字符串并再次测试,我得到:
$ java TestRegex
inputString: 'c = a + b + d / e * f - g; cout << " a + b ";'.
replace (+, add) ex: 'c = a add b add d / e * f - g; cout << " a + b ";'.
replacedAll: 'c = a add b add d div e mult f sub g; cout << " a + b ";'.
Hope this helps. 希望这可以帮助。
My brain hurts. 我的脑袋疼。
test1.c test1.c
#include <stdio.h>
#define MACRO1 a+b-c*d/e
#define MACRO2 "a+b-c*\
d/e\
a + b - c * d / e \
", a+b-c*d/e
const char* str = "\
as\"\\/\"-\\\\+df\
\\""asdf+""-*\\""/""\
";
/* comment + - * /
a / b * c - d + e */
// comment + - * / blah "+ - * /" \+\\\
char dq = '"'+0*'\"'/'\'';
char* cp = &dq;
const int* ip1;
int const* ip2;
int const * const * ip3;
void* vp;
long (*fp)(int(*));
char *a[23], *ZXCV,
**p2p
,***(xx),
* * c, * * * d[34];
struct s {};
typedef struct s* blah;
struct s s1,*S2;
enum E {E1};
enum E* e;
union U {};
union U* u;
int main(void) {
int x = 1+2-3*4/5;
x++ +1; ++x+2;
x-- -3; --x+4;
x = - --x;
x = + ++x;
x = -- x -3;
x = ++ x +4;
x += +1;
x -= -1;
x *= +1;
x /= -1;
ip1 = (int const *)str;
int y = * ip1;
blah *pblah; // can't recognize typedef
#define OPAQUE int a =
OPAQUE*ip1; // can't recognize macro
printf("test: %d %s\n", x, str );
} // end main() 1+2-3*4/5
COpSub.java COpSub.java
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.Map;
import java.util.HashMap;
import java.nio.file.Files;
import java.nio.file.Paths;
public class COpSub {
public static void main(String[] args) throws Exception {
if (args.length != 2) { System.err.println("error: require two arguments."); System.exit(1); }
String fileName = args[0];
String encoding = args[1];
String source = readFile(fileName,encoding);
System.out.print(sub(source));
System.exit(0);
} // end main()
public static String sub(String s) throws Exception {
Map<String,String> m = new HashMap<String,String>();
// note: replacements must be escaped for appendReplacement()!
m.put("+","plus");
m.put("+=","plusequals");
m.put("-","minus");
m.put("-=","minusequals");
m.put("*","mul");
m.put("*=","mulequals");
m.put("/","div");
m.put("/=","divequals");
m.put("++","plusplus");
m.put("--","minusminus");
String typeAlternation = "void|char|signed\\s+char|unsigned\\s+char|short|short\\s+int|signed\\s+short|signed\\s+short\\s+int|unsigned\\s+short|unsigned\\s+short\\s+int|int|signed|unsigned|signed\\s+int|unsigned\\s+int|long|long\\s+int|signed\\s+long|signed\\s+long\\s+int|unsigned\\s+long|unsigned\\s+long\\s+int|long\\s+long|long\\s+long\\s+int|signed\\s+long\\s+long|signed\\s+long\\s+long\\s+int|unsigned\\s+long\\s+long|unsigned\\s+long\\s+long\\s+int|float|double|long\\s+double|(?:struct|enum|union)\\s+\\w+|const";
String safeCluster = ""
+"(?:" // overarching cluster
+ "\\s*" // skip over all leading whitespace to get to the interesting stuff
+ "(?:" // safe extent alternation
+ "(?:"+typeAlternation+")(?:\\s*\\*)+" // safe extent #1: pointer (to pointer to pointer...) of type
+ "|[+-]\\s+[+-]" // safe extent #2: non-lvalue requiring whitespace separation
+ "|[~!%^&*(=+\\[|;,?-](?:\\s*(?!\\+\\+|--)[+*-](?<!\\+\\+|--))++" // safe extent #3: non-lvalue not requiring whitespace separation -- can't include slash -- must be possessive to not give back part of pre/post increment/decrement -- fix broken vim syntax highlighting: )]
+ "|[^'\"+*/-]" // safe extent #4: guarantee safe punctuation char
+ ")" // end safe extent alternation
+")*+" // possessive gobble of safe extents
;
Pattern pattern = Pattern.compile(""
+"\\G" // start from previous match, or start-of-string for first search
+"(" // capture prefix
+ safeCluster // possessive gobble of safe extents
+ "(?:" // possessive zero-or-more unsafe extent clusters (possessive required for final match to not give up slash pattern)
+ "(?:" // unsafe extent cluster alternation (no suffix)
+ "'\\\\?.'" // unsafe extent #1: single-quoted char
+ "|\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"" // unsafe extent #2: double-quoted string; note this has its own internal safe gobble followed by zero-or-more unsafe extent cluster with suffix
+ "|/\\*[^*]*(?:\\*[^/][^*]*)*\\*/" // unsafe extent #3: traditional C comments; ditto
+ "|//[^\\n]*\\n" // unsafe extent #4: modern C comments; ditto
+ ")" // end unsafe extent cluster alternation (no suffix)
+ safeCluster // unsafe extent cluster safe suffix
+ ")*+" // end possessive zero-or-more unsafe extent clusters
+")" // end capture prefix
+"(\\+\\+|--|[+*/-]=?)" // capture operator
, Pattern.DOTALL );
StringBuffer b = new StringBuffer();
Matcher matcher = pattern.matcher(s);
boolean lastMatchWasOpAssign = false;
while (matcher.find()) {
if (lastMatchWasOpAssign)
matcher.appendReplacement(b, "$1$2" );
else
matcher.appendReplacement(b, "$1 "+m.get(matcher.group(2))+' ' );
lastMatchWasOpAssign = matcher.group(2).length() == 2 && matcher.group(2).charAt(1) == '=';
} // end while
matcher.appendTail(b);
return b.toString();
} // end sub()
public static String readFile(String fileName, String encoding ) throws Exception {
byte[] encoded = Files.readAllBytes(Paths.get(fileName));
return new String(encoded, encoding );
} // end readFile()
} // end class COpSub
Demo 演示
> gcc test1.c -o test1;
> ./test1;
test: -4 as"\/"-\\+df\asdf+-*\/
> javac COpSub.java;
> CLASSPATH=. java COpSub test1.c UTF-8;
#include <stdio.h>
#define MACRO1 a plus b minus c mul d div e
#define MACRO2 "a+b-c*\
d/e\
a + b - c * d / e \
", a plus b minus c mul d div e
const char* str = "\
as\"\\/\"-\\\\+df\
\\""asdf+""-*\\""/""\
";
/* comment + - * /
a / b * c - d + e */
// comment + - * / blah "+ - * /" \+\\\
char dq = '"' plus 0 mul '\"' div '\'';
char* cp = &dq;
const int* ip1;
int const* ip2;
int const * const * ip3;
void* vp;
long (*fp)(int(*));
char *a[23], *ZXCV,
**p2p
,***(xx),
* * c, * * * d[34];
struct s {};
typedef struct s* blah;
struct s s1,*S2;
enum E {E1};
enum E* e;
union U {};
union U* u;
int main(void) {
int x = 1 plus 2 minus 3 mul 4 div 5;
x plusplus plus 1; plusplus x plus 2;
x minusminus minus 3; minusminus x plus 4;
x = - minusminus x;
x = + plusplus x;
x = minusminus x minus 3;
x = plusplus x plus 4;
x plusequals +1;
x minusequals -1;
x mulequals +1;
x divequals -1;
ip1 = (int const *)str;
int y = * ip1;
blah mul pblah; // can't recognize typedef
#define OPAQUE int a =
OPAQUE mul ip1; // can't recognize macro
printf("test: %d %s\n", x, str );
} // end main() 1+2-3*4/5
Diff DIFF
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.