简体   繁体   English

我如何正则表达式从文本中删除空格和换行符,除非它们在json的字符串中?

[英]How do I regex remove whitespace and newlines from a text, except for when they are in a json's string?

I have an instruction like: 我有一条指令,如:

db.insert( {
    _id:3,
    cost:{_0:11},
    description:"This is a description.\nCool, isn\'t it?"
});

The Eclipse plugin I am using, called MonjaDB splits the instruction by newline and I get each line as a separate instruction, which is bad. 我正在使用的名为MonjaDB的Eclipse插件按换行符拆分指令,我将每一行作为单独的指令获取,这很糟糕。 I fixed it using ;(\\r|\\n)+ which now includes the entire instruction, however, when sanitizing the newlines between the parts of the JSON, it also sanitizes the \\n and \\r within string in the json itself. 我使用;(\\ r | \\ n)+修复了该问题,现在它包含了整个指令,但是,当清理JSON部分之间的换行符时,它还清理了json本身的字符串中的\\ n和\\ r。

How do I avoid removing \\t, \\r, \\n from within JSON strings? 如何避免从JSON字符串中删除\\ t,\\ r,\\ n? which are, of course, delimited by "" or ''. 当然用“”或“”分隔。

You need to arrange to ignore whitespace when it appears within quotes,. 当空格出现在引号中时,您需要安排忽略空格。 So as suggested by one of the commenters: 因此,正如其中一位评论者所建议的那样:

\s+ | ( "  (?: [^"\\]  |  \\ . ) * " )              // White-space inserted for readability

Match java whitespace or a double-quoted string where a string consists of " followed by any non-escape, non-quote or an escape + plus any character, then a final " . 匹配java空格或双引号字符串,其中字符串包含"后跟任何非转义,非引号或转义+加上任何字符,然后是最后一个" This way, whitespaces inside strings are not matched. 这样,字符串内的空格将不匹配。

and replace with $1 if $1 is not null. 如果$ 1不为空,则替换为$ 1。

    Pattern clean = Pattern.compile(" \\s+ | ( \" (?: [^\"\\\\] | \\\\ . ) * \" ) ", Pattern.COMMENTS | Pattern.DOTALL);

StringBuffer sb = new StringBuffer();
Matcher m = clean.matcher( json );
while (m.find()) {
    m.appendReplacement(sb, "" );
    // Don't put m.group(1) in the appendReplacement because if it happens to contain $1 or $2 you'll get an error.
    if ( m.group(1) != null )
        sb.append( m.group(1) );
}
m.appendTail(sb);

String cleanJson = sb.toString();

This is totally off the top of my head but I'm pretty sure it's close to what you want. 这完全不在我的脑海中,但是我很确定它已经接近您想要的。

Edit: I've just got access to a Java IDE and tried out my solution. 编辑:我刚刚可以访问Java IDE,并尝试了我的解决方案。 I had made a couple of mistakes with my code including using \\. 我在代码中犯了一些错误,包括使用\\. instead of . 代替. in the Pattern. 在模式中。 So I have fixed that up and run it on a variation of your sample: 因此,我已对其进行修复,并在您的样本变体中运行它:

db.insert( {
    _id:3,
    cost:{_0:11},
    description:"This is a \"description\" with an embedded newline: \"\n\".\nCool, isn\'t it?"
});

The code: 编码:

    String json = "db.insert( {\n" +
            "    _id:3,\n" +
            "    cost:{_0:11},\n" +
            "    description:\"This is a \\\"description\\\" with an embedded newline: \\\"\\n\\\".\\nCool, isn\\'t it?\"\n" +
            "});";

        // insert above code

        System.out.println(cleanJson);

This produces: 这将产生:

db.insert({_id:3,cost:{_0:11},description:"This is a \"description\" with an embedded newline: \"\n\".\nCool, isn\'t it?"});

which is the same json expression with all whitespace removed outside quoted strings and whitespace and newlines retained inside quoted strings. 这是相同的json表达式,其中所有带引号的字符串都删除了空白,而带引号的字符串内保留了换行符和换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM