简体   繁体   中英

How to replace non-ASCII characters in a sequence?

Essentially, what this code does is:

  • Take an input.
  • Replace each sequence of characters whose length is greater than 2 with the number of times that character repeated and the character itself (eg jjjkkkkkllll = 3j5k4l ). The input does not contain any numeric values.
  • Return the result.

The code:

private String replaceConsecutiveChars(String data) {
    char[] dataChars = data.toCharArray();

    int i = 0;
    int k = 0;
    Character charType = null;
    for(Character c : dataChars) {
        if(k == dataChars.length - 1 && i >= 2) {
            data = data.replace(repeat(String.valueOf(charType), ++i), (i + Character.toString(charType)));
            break;
        }

        if(i == 0) {
            charType = c;
            i++;
        }else if(c == charType) {
            i++;
        }else if(c != charType && i > 2) {
            data = data.replace(repeat(String.valueOf(charType), i), (i + Character.toString(charType)));

            i = 1;
            charType = c;
        }else if(c != charType && i <= 2) {
            i = 1;
            charType = c;
        }

        k++;
    }

    return data;
}

private String repeat(String s, int n) {
    return Stream.generate(() -> s).limit(n).collect(Collectors.joining(""));
}

However, my implementation only seems to work with the limited-ASCII character set, but I am trying to get it work with the Unicode character set. For example:

  • The input ddddddddkkkkkpppp will correctly output 8d5k4p .
  • The input êêêêÌÌÌÌÌÌÌØØØ will incorrectly output êêêêÌÌÌÌÌÌÌØØØ
  • The input "rrrrrêêêêÌÌÌÌÌkkkkØØØ" will incorrectly output 5rêêêêÌÌÌÌÌ4kØØØ

Why is this?

In addition, is there a better way I could do this than the way I'm doing it right now?

You are comparing instances of Character using == , which will not work as expected because the operator compares object references instead of values.

A simple quick fix is to change the for-loop to:

for (char c : dataChars) {
}

Notice the change of types (Character to char). This way charType is automatically unboxed to the primitive char when comparing it to c .

Another solution is to replace every c == charType with c.equals(charType) to not compare references, but values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM