简体   繁体   English

正则表达式用于在引号括起时删除字符串中的逗号

[英]Regex for removing comma in a String when it is enclosed by quotes

I need to remove commas within a String only when enclosed by quotes. 我只需要用引号括起来删除String中的逗号。

example: 例:

String a = "123, \"Anders, Jr.\", John, john.anders@company.com,A"

after replacement should be 更换后应该是

String a = "123, Anders Jr., John, john.anders@company.com,A"

Can you please give me sample java code to do this? 你能给我一些示例java代码吗?

Thanks much, 非常感谢,

Lina 丽娜

It also seems you need to remove the quotes, judging by your example. 根据您的示例判断,您似乎还需要删除引号。

You can't do that in a single regexp. 你不能在一个正则表达式中做到这一点。 You would need to match over each instance of 您需要匹配每个实例

"[^"]*"

then strip the surrounding quotes and replace the commas. 然后删除周围的引号并替换逗号。 Are there any other characters which are troublesome? 还有其他任何麻烦的角色吗? Can quote characters be escaped inside quotes, eg. 可以引用字符在引号内转义,例如。 as '""'? 作为''“'?

It looks like you are trying to parse CSV. 看起来您正在尝试解析CSV。 If so, regex is insufficient for the task and you should look at one of the many free Java CSV parsers. 如果是这样,正则表达式不足以完成任务,您应该查看许多免费的Java CSV解析器之一。

I believe you asked for a regex trying to get an "elegant" solution, nevertheless maybe a "normal" answer is better fitted to your needs... this one gets your example perfectly, although I didn't check for border cases like two quotes together, so if you're going to use my example, check it thoroughly 我相信你要求一个正则表达式尝试获得一个“优雅”的解决方案,然而也许一个“正常”的答案更适合你的需求......这个完美得到你的榜样,虽然我没有检查像两个边界情况一起引用,所以如果你要使用我的例子,请仔细检查

boolean deleteCommas = false;
for(int i=0; i > a.length(); i++){
    if(a.charAt(i)=='\"'){
        a = a.substring(0, i) + a.substring(i+1, a.length());
        deleteCommas = !deleteCommas;
    }
    if(a.charAt(i)==','&&deleteCommas){
        a = a.substring(0, i) + a.substring(i+1, a.length());
    }
}

There are two major problems with the accepted answer. 接受的答案有两个主要问题。 First, the regex "(.*)\\"(.*),(.*)\\"(.*)" will match the whole string if it matches anything, so it will remove at most one comma and two quotation marks. 首先,正则表达式"(.*)\\"(.*),(.*)\\"(.*)"将匹配整个字符串,如果它匹配任何东西,所以它将删除最多一个逗号和两个引号。

Second, there's nothing to ensure that the comma and quotes will all be part of the same field; 其次,没有什么可以确保逗号和引号都属于同一个领域; given the input ("foo", "bar") it will return ("foo "bar) . 给定输入("foo", "bar")它将返回("foo "bar) It also doesn't account for newlines or escaped quotation marks, both of which are permitted in quoted fields. 它也不考虑换行符或转义引号,引号字段中允许使用这两个引号。

You can use regexes to parse CSV data, but it's much trickier than most people expect. 您可以使用正则表达式来解析CSV数据,但它比大多数人期望的要复杂得多。 But why bother fighting with it when, as bobince pointed out , there are several free CSV libraries out there for the downloading? 但是, 正如bobince指出的那样 ,为什么还有几个免费的CSV库供下载?

Should work: 应该管用:

s/(?<="[^"]*),(?=[^"]*")//g
s/"//g

This looks like a line from a CSV file, parsing it through any reasonable CSV library would automatically deal with this issue for you. 这看起来像是CSV文件中的一行,通过任何合理的CSV库解析它会自动为您解决此问题。 At least by reading the quoted value into a single 'field'. 至少通过将引用值读入单个“字段”。

The following perl works for most cases: 以下perl适用于大多数情况:

open(DATA,'in/my.csv');
while(<DATA>){
  if(/(,\s*|^)"[^"]*,[^"]*"(\s*,|$)/){
    print "Before: $_";
    while(/(,\s*|^)"[^"]*,[^"]*"(\s*,|$)/){
      s/((?:^|,\s*)"[^"]*),([^"]*"(?:\s*,|$))/$1 $2/
    }
    print "After: $_";
  }
}

It's looking for: 它正在寻找:

  • (comma plus optional spaces) or start of line (逗号加可选空格)或行首
  • a quote 一句话
  • 0 or more non-quotes 0个或更多非引号
  • a comma 一个逗号
  • 0 or more non-quotes 0个或更多非引号
  • (optional spaces plus comma) or end of line (可选空格加逗号)或行尾

If found, it will then keep replacing the comma with a space until it can find no more examples. 如果找到,它将继续用空格替换逗号,直到找不到更多示例。

It works because of an assumption that the opening quote will be preceded by a comma plus optional spaces (or will be at the start of the line), and the closing quote will be followed by optional spaces plus a comma, or will be the end of the line. 它的工作原理是假设开头引号前面有逗号加上可选空格(或者在行的开头),结束引号后面跟可选空格和逗号,或者结束这条线。

I'm sure there are cases where it will fail - if anyone can post 'em, I'd be keen to see them... 我确信有些情况会失败 - 如果有人能发帖,我会热衷于看到他们......

My answer is not a regex, but I believe it is simpler and more efficient. 我的答案不是正则表达式,但我相信它更简单,更有效。 Change the line to a char array, then go through each char. 将行更改为char数组,然后遍历每个char。 Keep track of even or odd quote amounts. 跟踪偶数或奇数报价金额。 If odd amount of quotes and you have a comma, then don't add it. 如果报价数量奇怪并且您有逗号,则不要添加它。 Should look something like this. 应该看起来像这样。

public String removeCommaBetweenQuotes(String line){


    int charCount = 0;
    char[] charArray = line.toCharArray();
    StringBuilder newLine = new StringBuilder();

    for(char c : charArray){

        if(c == '"'){
            charCount++;
            newLine.append(c);
        }

        else if(charCount%2 == 1 && c == ','){
            //do nothing
        }

        else{
            newLine.append(c);
        }


    }

    return newLine.toString();


}

Probably grossly inefficiënt but it seems to work. 可能非常低效但似乎有效。

import java.util.regex.*;

StringBuffer ResultString = new StringBuffer();

try {
    Pattern regex = Pattern.compile("(.*)\"(.*),(.*)\"(.*)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
    Matcher regexMatcher = regex.matcher(a);
    while (regexMatcher.find()) {
        try {
            // You can vary the replacement text for each match on-the-fly
            regexMatcher.appendReplacement(ResultString, "$1$2$3$4");
        } catch (IllegalStateException ex) {
            // appendReplacement() called without a prior successful call to find()
        } catch (IllegalArgumentException ex) {
            // Syntax error in the replacement text (unescaped $ signs?)
        } catch (IndexOutOfBoundsException ex) {
            // Non-existent backreference used the replacement text
        } 
    }
    regexMatcher.appendTail(ResultString);
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

This works fine. 这很好用。 '<' instead of '>' '\\ n'而不是'>'

boolean deleteCommas = false;
for(int i=0; i < text.length(); i++){
    if(text.charAt(i)=='\''){
        text = text.substring(0, i) + text.substring(i+1, text.length());
        deleteCommas = !deleteCommas;
    }
    if(text.charAt(i)==','&&deleteCommas){
        text = text.substring(0, i) + text.substring(i+1, text.length());
    }
}

A simpler approach would be replacing the matches of this regular expression: 一种更简单的方法是替换此正则表达式的匹配:

("[^",]+),([^"]+")

By this: 这样:

$1$2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM