简体   繁体   English

计算字符串中逗号的数量,但双引号之间的逗号除外

[英]Count number of commas within a string except for commas between double quotes

I have the following function to count the number of commas (or any other character) in a String without counting those inside double quotes. 我有以下函数来计算字符串中的逗号(或任何其他字符)的数量,而不计算双引号内的逗号。 I want to know if there's a better way to achieve this or even if you can find some case where this function can crash. 我想知道是否有更好的方法来实现这一点,或者即使你可以找到一些这种功能可以崩溃的情况。

public int countCharOfString(char c, String s) {
    int numberOfC = 0;
    boolean doubleQuotesFound = false;
    for(int i = 0; i < s.length(); i++){
        if(s.charAt(i) == c && !doubleQuotesFound){
            numberOfC++;
        }else if(s.charAt(i) == c && doubleQuotesFound){
            continue;
        }else if(s.charAt(i) == '\"'){
            doubleQuotesFound = !doubleQuotesFound;
        }
    }
    return numberOfC;
}

Thanks for any advise 谢谢你的任何建议

This implementation has two differences: 此实现有两个不同之处:

  • Use CharSequence instead of String 使用CharSequence而不是String
  • No need of a boolean value to track if we are inside a quoted subsequence. 不需要boolean值来跟踪我们是否在引用的子序列中。

The function: 功能:

public static int countCharOfString(char quote, CharSequence sequence) {

    int total = 0, length = sequence.length();

    for(int i = 0; i < length; i++){
        char c = sequence.charAt(i);
        if (c == '"') {
            // Skip quoted sequence
            for (i++; i < length && sequence.charAt(i)!='"'; i++) {}
        } else if (c == quote) {
            total++;
        }
    }

    return total;
 }
public static int countCharOfString(char c, String s)
{
    int numberOfC = 0;
    int innerC = 0;
    boolean holdDoubleQuotes = false;
    for(int i = 0; i < s.length(); i++)
    {
        char r = s.charAt(i);
        if(i == s.length() - 1 && r != '\"')
        {
            numberOfC += innerC;
            if(r == c) numberOfC++;
        }
        else if(r == c && !holdDoubleQuotes) numberOfC++;
        else if(r == c && holdDoubleQuotes) innerC++;
        else if(r == '\"' && holdDoubleQuotes)
        {
            holdDoubleQuotes = false;
            innerC = 0;
        }
        else if(r == '\"' && !holdDoubleQuotes) holdDoubleQuotes = true;
    }
    return numberOfC;
}

System.out.println(countCharOfString(',', "Hello, BRabbit27, how\",,,\" are, you?"));

OUTPUT: OUTPUT:

3

An alternative would be using regex: 另一种方法是使用正则表达式:

public static int countCharOfString(char c, String s)
{
   s = " " + s + " "; // To make the first and last commas to be counted
   return s.split("[^\"" + c + "*\"][" + c + "]").length - 1;
}
  • you should not call charAt() several times inside the loop. 你不应该在循环中多次调用charAt() Use a char variable. 使用char变量。
  • you should not call length() for each iteration. 你不应该为每次迭代调用length() use an int before the loop. 在循环之前使用int
  • you should avoid duplicate comparison with c - use nested if/else. 你应该避免与c重复比较 - 使用嵌套if / else。

Maybe not the fastest... 也许不是最快......

public int countCharOfString(char c, String s) {
    final String removedQuoted = s.replaceAll("\".*?\"", "");
    int total = 0;
    for(int i = 0; i < removedQuoted.length(); ++i)
        if(removedQuoted.charAt(i) == c)
            ++total;
    return total;
}

It takes a large string to make a big difference. 它需要一个大的字符串才能产生很大的不同。

The reason this code is faster is it contains on average 1.5 checks per loop instead of 3 checks per loop. 这段代码更快的原因是它每个循环平均包含1.5个检查,而不是每个循环3次检查。 It does this by using two loops, one for quoted and one for unquoted state. 它通过使用两个循环来实现这一点,一个用于引用,一个用于未引用的状态。

public static void main(String... args) {
    String s = generateString(20 * 1024 * 1024);
    for (int i = 0; i < 15; i++) {
        long start = System.nanoTime();
        countCharOfString(',', s);
        long mid = System.nanoTime();
        countCharOfString2(',', s);
        long end = System.nanoTime();
        System.out.printf("countCharOfString() took %.3f ms, countCharOfString2() took %.3f ms%n",
                (mid - start) / 1e6, (end - mid) / 1e6);
    }
}

private static String generateString(int length) {
    StringBuilder sb = new StringBuilder(length);
    Random rand = new Random(1);
    while (sb.length() < length)
        sb.append((char) (rand.nextInt(96) + 32)); // includes , and "
    return sb.toString();
}

public static int countCharOfString2(char c, String s) {
    int numberOfC = 0, i = 0;
    while (i < s.length()) {
        // not quoted
        while (i < s.length()) {
            char ch = s.charAt(i++);
            if (ch == c)
                numberOfC++;
            else if (ch == '"')
                break;
        }
        // quoted
        while (i < s.length()) {
            char ch = s.charAt(i++);
            if (ch == '"')
                break;
        }
    }
    return numberOfC;
}


public static int countCharOfString(char c, String s) {
    int numberOfC = 0;
    boolean doubleQuotesFound = false;
    for (int i = 0; i < s.length(); i++) {
        if (s.charAt(i) == c && !doubleQuotesFound) {
            numberOfC++;
        } else if (s.charAt(i) == c && doubleQuotesFound) {
            continue;
        } else if (s.charAt(i) == '\"') {
            doubleQuotesFound = !doubleQuotesFound;
        }
    }
    return numberOfC;
}

prints 版画

countCharOfString() took 33.348 ms, countCharOfString2() took 31.381 ms
countCharOfString() took 28.265 ms, countCharOfString2() took 25.801 ms
countCharOfString() took 28.142 ms, countCharOfString2() took 14.576 ms
countCharOfString() took 28.372 ms, countCharOfString2() took 14.540 ms
countCharOfString() took 28.191 ms, countCharOfString2() took 14.616 ms

Simpler, less bug-prone (and yes, less performant than walking the string char by char and keeping track of everything by hand): 更简单,更不容易出错(是的,比通过char遍历字符串char并且手动跟踪所有内容的性能更低):

public static int countCharOfString(char c, String s) {
  s = s.replaceAll("\".*?\"", "");
  int cnt = 0;
  for (int foundAt = s.indexOf(c); foundAt > -1; foundAt = s.indexOf(c, foundAt+1)) 
    cnt++;
  return cnt;
}

You could also use a regex and String.split() 你也可以使用正则表达式和String.split()

It might look something like this: 它可能看起来像这样:

public int countNonQuotedOccurrences(String inputstring, char searchChar)
{
  String regexPattern = "[^\"]" + searchChar + "[^\"]";
  return inputString.split(regexPattern).length - 1;
}

Disclaimer: 免责声明:

This just shows the basic approach. 这只是展示了基本方法。

The above code will not check for searchChar at the beginning or end of the string. 上面的代码不会检查字符串开头或结尾的searchChar。

You could either check for this manually or add to regexPattern. 您可以手动检查或添加到regexPattern。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何用双引号(“,”)之间的逗号替换String中的所有逗号(,)? - How to replace all commas (,) in a String with a commas between double quotes (“,”)? 在逗号上拆分一个不带双引号的逗号的字符串 - Split a string on commas not contained within double-quotes with a twist 匹配除引号之外的引号之间的任何内容 - Match anything between quotes except commas 在Spring MVC中将带逗号的字符串转换为双数 - Convert string with commas to double number in spring MVC 正则表达式,替换双引号之间的所有逗号 - Regular expression, replace all commas between double quotes 用引号和引号内的逗号分隔逗号分隔的字符串,并在引号内使用转义引号 - Split comma separated string with quotes and commas within quotes and escaped quotes within quotes Java CSVReader以双引号忽略逗号 - Java CSVReader ignore commas in double quotes 正则表达式 - 用逗号分隔字符串(引号中的逗号除外)和换行符(\\ n或\\ r \\ n) - Regex - Split a string by commas (except commas in quotations) AND newlines (\n or \r) 如何在其他单词的字符串中将逗号插入数字 - How to Insert Commas Into a Number WITHIN a String of Other Words 正则表达式:逗号分割,但在括号和引号中排除逗号(单双和双) - Regex : Split on comma , but exclude commas within parentheses and quotes(Both single & Double)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM