繁体   English   中英

Java正则表达式,用于在引号内转义字符

[英]Java Regular Expressions for escaping characters inside quotes

我有文件

文档中的所有算术字符( +-*/ ); 我要替换为它们的名称( addsubmultdiv ),除非这些字符出现在双引号内。

例如:

a + b;
"a + b";

OUTPUT:

a add b;
"a + b";

您可以将文档视为一个C程序,我想在其中进行算术运算并将其转换为它们的含义( addsub ,...),但是如果它在双引号内,则我不希望对其进行处理。

如何使用Java正则表达式捕获它?

以下正则表达式( 在regex101上尝试

[^\"].*(\+|\-|\*|\/).*[^\"]\;

火柴:

[^\\"] -任何非"

.* -后跟任何内容

(\\+|\\-|\\*|\\/) -捕获组。 捕获+-*/

.* -后跟任何内容

[^\\"] -任何非"

由于您是在Java中使用它,因此必须再次转义所有正斜杠。

Java兼容的REGEX:

"[^\\\"].*(\\+|\\-|\\*|\\/).*[^\\\"]\\;"

您可以使用String replaceAll方法

例:

    public class MyTest {

    /**
     * @param args
     */
    public static void main(String[] args) {

        String operation = "\"a + b / c\"";
        String result = operation.replaceAll("\\+", "add").replaceAll("\\/", "div");
        System.out.println(result);

    }

}

将输出:“ a add b div c”

授予了这种看起来像家庭作业的功能,但对我来说,这看起来更像是Regex问题而不是Java问题。

这是一个示例,说明如何实现原始问题中要求的输出:

$ cat TestRegex.java 
public class TestRegex {
    public static void main(final String[] args) {
        String inputString = "c = a + b; cout << \" a + b \";";
        System.out.println("inputString: '" + String.valueOf(inputString) + "'.");
        System.out.println("replace (+, add) ex: '" + String.valueOf(inputString.replaceAll("(\\+)(?=(?:[^\"]|\"[^\"]*\")*$)", "add")) + "'.");
        System.out.println("replacedAll: '" +
            String.valueOf(
                inputString
                .replaceAll("(\\+)(?=(?:[^\"]|\"[^\"]*\")*$)", "add")
                .replaceAll("(\\-)(?=(?:[^\"]|\"[^\"]*\")*$)", "sub")
                .replaceAll("(\\*)(?=(?:[^\"]|\"[^\"]*\")*$)", "mult")
                .replaceAll("(\\/)(?=(?:[^\"]|\"[^\"]*\")*$)", "div")
            ) + "'."
        );
    }
}

带有示例输出:

$ java TestRegex
inputString: 'c = a + b; cout << " a + b ";'.
replace (+, add) ex: 'c = a add b; cout << " a + b ";'.
replacedAll: 'c = a add b; cout << " a + b ";'.

进一步修改输入字符串并再次测试,我得到:

$ java TestRegex
inputString: 'c = a + b + d / e * f - g; cout << " a + b ";'.
replace (+, add) ex: 'c = a add b add d / e * f - g; cout << " a + b ";'.
replacedAll: 'c = a add b add d div e mult f sub g; cout << " a + b ";'.

希望这可以帮助。

我的脑袋疼。

test1.c

#include <stdio.h>

#define MACRO1 a+b-c*d/e
#define MACRO2 "a+b-c*\
    d/e\
    a + b - c * d / e \
    ", a+b-c*d/e

const char* str = "\
    as\"\\/\"-\\\\+df\
\\""asdf+""-*\\""/""\
";

/* comment + - * /
 a / b * c - d + e */
// comment + - * / blah "+ - * /" \+\\\

char dq = '"'+0*'\"'/'\'';
char* cp = &dq;
const int* ip1;
int const* ip2;
int const * const * ip3;
void* vp;
long (*fp)(int(*));

char *a[23], *ZXCV,
**p2p
,***(xx),
* * c, * * * d[34];

struct s {};
typedef struct s* blah;
struct s s1,*S2;
enum E {E1};
enum E* e;
union U {};
union U* u;

int main(void) {
    int x = 1+2-3*4/5;
    x++ +1; ++x+2;
    x-- -3; --x+4;
    x = - --x;
    x = + ++x;
    x = -- x -3;
    x = ++ x +4;
    x += +1;
    x -= -1;
    x *= +1;
    x /= -1;
    ip1 = (int const *)str;
    int y = * ip1;
    blah *pblah; // can't recognize typedef
    #define OPAQUE int a =
    OPAQUE*ip1; // can't recognize macro
    printf("test: %d %s\n", x, str );
} // end main() 1+2-3*4/5

COpSub.java

import java.util.regex.Pattern;
import java.util.regex.Matcher;

import java.util.Map;
import java.util.HashMap;

import java.nio.file.Files;
import java.nio.file.Paths;

public class COpSub {

    public static void main(String[] args) throws Exception {
        if (args.length != 2) { System.err.println("error: require two arguments."); System.exit(1); }
        String fileName = args[0];
        String encoding = args[1];
        String source = readFile(fileName,encoding);
        System.out.print(sub(source));
        System.exit(0);
    } // end main()

    public static String sub(String s) throws Exception {

        Map<String,String> m = new HashMap<String,String>();
        // note: replacements must be escaped for appendReplacement()!
        m.put("+","plus");
        m.put("+=","plusequals");
        m.put("-","minus");
        m.put("-=","minusequals");
        m.put("*","mul");
        m.put("*=","mulequals");
        m.put("/","div");
        m.put("/=","divequals");
        m.put("++","plusplus");
        m.put("--","minusminus");

        String typeAlternation = "void|char|signed\\s+char|unsigned\\s+char|short|short\\s+int|signed\\s+short|signed\\s+short\\s+int|unsigned\\s+short|unsigned\\s+short\\s+int|int|signed|unsigned|signed\\s+int|unsigned\\s+int|long|long\\s+int|signed\\s+long|signed\\s+long\\s+int|unsigned\\s+long|unsigned\\s+long\\s+int|long\\s+long|long\\s+long\\s+int|signed\\s+long\\s+long|signed\\s+long\\s+long\\s+int|unsigned\\s+long\\s+long|unsigned\\s+long\\s+long\\s+int|float|double|long\\s+double|(?:struct|enum|union)\\s+\\w+|const";
        String safeCluster = ""
            +"(?:"                                               // overarching cluster
            +  "\\s*"                                            // skip over all leading whitespace to get to the interesting stuff
            +  "(?:"                                             // safe extent alternation
            +    "(?:"+typeAlternation+")(?:\\s*\\*)+"           // safe extent #1: pointer (to pointer to pointer...) of type
            +    "|[+-]\\s+[+-]"                                 // safe extent #2: non-lvalue requiring whitespace separation
            +    "|[~!%^&*(=+\\[|;,?-](?:\\s*(?!\\+\\+|--)[+*-](?<!\\+\\+|--))++" // safe extent #3: non-lvalue not requiring whitespace separation -- can't include slash -- must be possessive to not give back part of pre/post increment/decrement -- fix broken vim syntax highlighting: )]
            +    "|[^'\"+*/-]"                                   // safe extent #4: guarantee safe punctuation char
            +  ")"                                               // end safe extent alternation
            +")*+"                                               // possessive gobble of safe extents
        ;

        Pattern pattern = Pattern.compile(""
            +"\\G"                                               // start from previous match, or start-of-string for first search
            +"("                                                 // capture prefix
            +  safeCluster                                       // possessive gobble of safe extents
            +  "(?:"                                             // possessive zero-or-more unsafe extent clusters (possessive required for final match to not give up slash pattern)
            +    "(?:"                                           // unsafe extent cluster alternation (no suffix)
            +      "'\\\\?.'"                                    // unsafe extent #1: single-quoted char
            +      "|\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""         // unsafe extent #2: double-quoted string; note this has its own internal safe gobble followed by zero-or-more unsafe extent cluster with suffix
            +      "|/\\*[^*]*(?:\\*[^/][^*]*)*\\*/"             // unsafe extent #3: traditional C comments; ditto
            +      "|//[^\\n]*\\n"                               // unsafe extent #4: modern C comments; ditto
            +    ")"                                             // end unsafe extent cluster alternation (no suffix)
            +    safeCluster                                     // unsafe extent cluster safe suffix
            +  ")*+"                                             // end possessive zero-or-more unsafe extent clusters
            +")"                                                 // end capture prefix
            +"(\\+\\+|--|[+*/-]=?)"                              // capture operator
        , Pattern.DOTALL );

        StringBuffer b = new StringBuffer();
        Matcher matcher = pattern.matcher(s);
        boolean lastMatchWasOpAssign = false;
        while (matcher.find()) {
            if (lastMatchWasOpAssign)
                matcher.appendReplacement(b, "$1$2" );
            else
                matcher.appendReplacement(b, "$1 "+m.get(matcher.group(2))+' ' );
            lastMatchWasOpAssign = matcher.group(2).length() == 2 && matcher.group(2).charAt(1) == '=';
        } // end while
        matcher.appendTail(b);
        return b.toString();

    } // end sub()

    public static String readFile(String fileName, String encoding ) throws Exception {
        byte[] encoded = Files.readAllBytes(Paths.get(fileName));
        return new String(encoded, encoding );
    } // end readFile()

} // end class COpSub

演示

> gcc test1.c -o test1;
> ./test1;
test: -4  as"\/"-\\+df\asdf+-*\/

> javac COpSub.java;
> CLASSPATH=. java COpSub test1.c UTF-8;
#include <stdio.h>

#define MACRO1 a plus b minus c mul d div e
#define MACRO2 "a+b-c*\
    d/e\
    a + b - c * d / e \
    ", a plus b minus c mul d div e

const char* str = "\
    as\"\\/\"-\\\\+df\
\\""asdf+""-*\\""/""\
";

/* comment + - * /
 a / b * c - d + e */
// comment + - * / blah "+ - * /" \+\\\

char dq = '"' plus 0 mul '\"' div '\'';
char* cp = &dq;
const int* ip1;
int const* ip2;
int const * const * ip3;
void* vp;
long (*fp)(int(*));

char *a[23], *ZXCV,
**p2p
,***(xx),
* * c, * * * d[34];

struct s {};
typedef struct s* blah;
struct s s1,*S2;
enum E {E1};
enum E* e;
union U {};
union U* u;

int main(void) {
    int x = 1 plus 2 minus 3 mul 4 div 5;
    x plusplus   plus 1;  plusplus x plus 2;
    x minusminus   minus 3;  minusminus x plus 4;
    x = -  minusminus x;
    x = +  plusplus x;
    x =  minusminus  x  minus 3;
    x =  plusplus  x  plus 4;
    x  plusequals  +1;
    x  minusequals  -1;
    x  mulequals  +1;
    x  divequals  -1;
    ip1 = (int const *)str;
    int y = * ip1;
    blah  mul pblah; // can't recognize typedef
    #define OPAQUE int a =
    OPAQUE mul ip1; // can't recognize macro
    printf("test: %d %s\n", x, str );
} // end main() 1+2-3*4/5

DIFF

DIFF

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM