How to replace non-ASCII characters in a sequence?

Question

Essentially, what this code does is:

Take an input.
Replace each sequence of characters whose length is greater than 2 with the number of times that character repeated and the character itself (eg jjjkkkkkllll = 3j5k4l ). The input does not contain any numeric values.
Return the result.

The code:

private String replaceConsecutiveChars(String data) {
    char[] dataChars = data.toCharArray();

    int i = 0;
    int k = 0;
    Character charType = null;
    for(Character c : dataChars) {
        if(k == dataChars.length - 1 && i >= 2) {
            data = data.replace(repeat(String.valueOf(charType), ++i), (i + Character.toString(charType)));
            break;
        }

        if(i == 0) {
            charType = c;
            i++;
        }else if(c == charType) {
            i++;
        }else if(c != charType && i > 2) {
            data = data.replace(repeat(String.valueOf(charType), i), (i + Character.toString(charType)));

            i = 1;
            charType = c;
        }else if(c != charType && i <= 2) {
            i = 1;
            charType = c;
        }

        k++;
    }

    return data;
}

private String repeat(String s, int n) {
    return Stream.generate(() -> s).limit(n).collect(Collectors.joining(""));
}

However, my implementation only seems to work with the limited-ASCII character set, but I am trying to get it work with the Unicode character set. For example:

The input ddddddddkkkkkpppp will correctly output 8d5k4p .
The input êêêêÌÌÌÌÌÌÌØØØ will incorrectly output êêêêÌÌÌÌÌÌÌØØØ
The input "rrrrrêêêêÌÌÌÌÌkkkkØØØ" will incorrectly output 5rêêêêÌÌÌÌÌ4kØØØ

Why is this?

In addition, is there a better way I could do this than the way I'm doing it right now?

Answer 1

You are comparing instances of Character using == , which will not work as expected because the operator compares object references instead of values.

A simple quick fix is to change the for-loop to:

for (char c : dataChars) {
}

Notice the change of types (Character to char). This way charType is automatically unboxed to the primitive char when comparing it to c .

Another solution is to replace every c == charType with c.equals(charType) to not compare references, but values.

How to replace non-ASCII characters in a sequence?

Question

1 answers

solution1
4 ACCPTED 2017-07-06 18:57:57

How to replace non-ASCII characters in a sequence?

Question

1 answers

solution1 4 ACCPTED 2017-07-06 18:57:57

solution1
4 ACCPTED 2017-07-06 18:57:57