[英]Count number of commas within a string except for commas between double quotes
I have the following function to count the number of commas (or any other character) in a String without counting those inside double quotes. 我有以下函数来计算字符串中的逗号(或任何其他字符)的数量,而不计算双引号内的逗号。 I want to know if there's a better way to achieve this or even if you can find some case where this function can crash.
我想知道是否有更好的方法来实现这一点,或者即使你可以找到一些这种功能可以崩溃的情况。
public int countCharOfString(char c, String s) {
int numberOfC = 0;
boolean doubleQuotesFound = false;
for(int i = 0; i < s.length(); i++){
if(s.charAt(i) == c && !doubleQuotesFound){
numberOfC++;
}else if(s.charAt(i) == c && doubleQuotesFound){
continue;
}else if(s.charAt(i) == '\"'){
doubleQuotesFound = !doubleQuotesFound;
}
}
return numberOfC;
}
Thanks for any advise 谢谢你的任何建议
This implementation has two differences: 此实现有两个不同之处:
CharSequence
instead of String CharSequence
而不是String boolean
value to track if we are inside a quoted subsequence. boolean
值来跟踪我们是否在引用的子序列中。 The function: 功能:
public static int countCharOfString(char quote, CharSequence sequence) {
int total = 0, length = sequence.length();
for(int i = 0; i < length; i++){
char c = sequence.charAt(i);
if (c == '"') {
// Skip quoted sequence
for (i++; i < length && sequence.charAt(i)!='"'; i++) {}
} else if (c == quote) {
total++;
}
}
return total;
}
public static int countCharOfString(char c, String s)
{
int numberOfC = 0;
int innerC = 0;
boolean holdDoubleQuotes = false;
for(int i = 0; i < s.length(); i++)
{
char r = s.charAt(i);
if(i == s.length() - 1 && r != '\"')
{
numberOfC += innerC;
if(r == c) numberOfC++;
}
else if(r == c && !holdDoubleQuotes) numberOfC++;
else if(r == c && holdDoubleQuotes) innerC++;
else if(r == '\"' && holdDoubleQuotes)
{
holdDoubleQuotes = false;
innerC = 0;
}
else if(r == '\"' && !holdDoubleQuotes) holdDoubleQuotes = true;
}
return numberOfC;
}
System.out.println(countCharOfString(',', "Hello, BRabbit27, how\",,,\" are, you?"));
OUTPUT: OUTPUT:
3
An alternative would be using regex: 另一种方法是使用正则表达式:
public static int countCharOfString(char c, String s)
{
s = " " + s + " "; // To make the first and last commas to be counted
return s.split("[^\"" + c + "*\"][" + c + "]").length - 1;
}
charAt()
several times inside the loop. charAt()
。 Use a char
variable. char
变量。 length()
for each iteration. length()
。 use an int
before the loop. int
。 c
- use nested if/else. c
重复比较 - 使用嵌套if / else。 Maybe not the fastest... 也许不是最快......
public int countCharOfString(char c, String s) {
final String removedQuoted = s.replaceAll("\".*?\"", "");
int total = 0;
for(int i = 0; i < removedQuoted.length(); ++i)
if(removedQuoted.charAt(i) == c)
++total;
return total;
}
It takes a large string to make a big difference. 它需要一个大的字符串才能产生很大的不同。
The reason this code is faster is it contains on average 1.5 checks per loop instead of 3 checks per loop. 这段代码更快的原因是它每个循环平均包含1.5个检查,而不是每个循环3次检查。 It does this by using two loops, one for quoted and one for unquoted state.
它通过使用两个循环来实现这一点,一个用于引用,一个用于未引用的状态。
public static void main(String... args) {
String s = generateString(20 * 1024 * 1024);
for (int i = 0; i < 15; i++) {
long start = System.nanoTime();
countCharOfString(',', s);
long mid = System.nanoTime();
countCharOfString2(',', s);
long end = System.nanoTime();
System.out.printf("countCharOfString() took %.3f ms, countCharOfString2() took %.3f ms%n",
(mid - start) / 1e6, (end - mid) / 1e6);
}
}
private static String generateString(int length) {
StringBuilder sb = new StringBuilder(length);
Random rand = new Random(1);
while (sb.length() < length)
sb.append((char) (rand.nextInt(96) + 32)); // includes , and "
return sb.toString();
}
public static int countCharOfString2(char c, String s) {
int numberOfC = 0, i = 0;
while (i < s.length()) {
// not quoted
while (i < s.length()) {
char ch = s.charAt(i++);
if (ch == c)
numberOfC++;
else if (ch == '"')
break;
}
// quoted
while (i < s.length()) {
char ch = s.charAt(i++);
if (ch == '"')
break;
}
}
return numberOfC;
}
public static int countCharOfString(char c, String s) {
int numberOfC = 0;
boolean doubleQuotesFound = false;
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == c && !doubleQuotesFound) {
numberOfC++;
} else if (s.charAt(i) == c && doubleQuotesFound) {
continue;
} else if (s.charAt(i) == '\"') {
doubleQuotesFound = !doubleQuotesFound;
}
}
return numberOfC;
}
prints 版画
countCharOfString() took 33.348 ms, countCharOfString2() took 31.381 ms
countCharOfString() took 28.265 ms, countCharOfString2() took 25.801 ms
countCharOfString() took 28.142 ms, countCharOfString2() took 14.576 ms
countCharOfString() took 28.372 ms, countCharOfString2() took 14.540 ms
countCharOfString() took 28.191 ms, countCharOfString2() took 14.616 ms
Simpler, less bug-prone (and yes, less performant than walking the string char by char and keeping track of everything by hand): 更简单,更不容易出错(是的,比通过char遍历字符串char并且手动跟踪所有内容的性能更低):
public static int countCharOfString(char c, String s) {
s = s.replaceAll("\".*?\"", "");
int cnt = 0;
for (int foundAt = s.indexOf(c); foundAt > -1; foundAt = s.indexOf(c, foundAt+1))
cnt++;
return cnt;
}
You could also use a regex and String.split() 你也可以使用正则表达式和String.split()
It might look something like this: 它可能看起来像这样:
public int countNonQuotedOccurrences(String inputstring, char searchChar)
{
String regexPattern = "[^\"]" + searchChar + "[^\"]";
return inputString.split(regexPattern).length - 1;
}
Disclaimer: 免责声明:
This just shows the basic approach. 这只是展示了基本方法。
The above code will not check for searchChar at the beginning or end of the string. 上面的代码不会检查字符串开头或结尾的searchChar。
You could either check for this manually or add to regexPattern. 您可以手动检查或添加到regexPattern。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.