简体   繁体   中英

Removing duplicates words from a string

I have a string like:

Hello how how how are are you you?

I love cookies cookies, apples and pancakes pancakes.

I wish for an output:

Hello how are you?

I love cookies, apples and pancakes.

Till now I have coded:

String[] s = input.split(" ");
String prev = s[0];
String ans = prev + " ";

for (int i = 1; i < s.length; i++) {

    if (!prev.equals(s[i])) {
        prev = s[i];
        ans += prev + " ";
    }
}

System.out.println(ans);

I get outputs as:

Hello how are you you?

I love cookies cookies, apples and pancakes pancakes.

I need some help with the logic for , . ! ? . , . ! ? . ..

you can use regex to do this for you. sample code:

String regex = "\\b(\\w+)\\b\\s*(?=.*\\b\\1\\b)";
input = input.replaceAll(regex,"");
  1. \\b Matches a word boundary position between a word character and non-word character or position (start / end of string).
  2. \\w Matches any word character (alphanumeric & underscore).
  3. \\b Matches a word boundary position between a word character and non-word character or position (start / end of string).
  4. \\s Matches any whitespace character (spaces, tabs, line breaks).
  5. * Match 0 or more of the preceding token.
  6. (?= Matches a group after the main expression without including it in the result.
  7. . Matches any character except line breaks.
  8. \\1 Matches the results of capture group #1 in step 2.

Note: It is important to use word boundaries here to avoid matching partial words.

Here's a link to regex demo and explaination : RegexDemo

You should use a secondary variable to store your words without the punctuation.

String[] s = input.split(" ");
String ans = "";

for (int i = 0; i < s.length - 1; i++) {

    String currentAux = s[i].replaceAll("[,.!?]", "");
    String nextAux = s[i + 1].replaceAll("[,.!?]", "");

    if (nextAux.equals(currentAux)) {
        continue;
    }

    ans += " " + s[i];
}

ans += " " + s[s.length - 1];

System.out.println(ans);

You can use java.util.StringTokenizer to tokenize the words. Make sure to set the delimiters to split the words. In your case they are spaces, commas and full stops. This can help you to split the words without the punctuation marks. Then you can compare the previous token with the current and if they are equal you can ignore it.

You can try this code snippet:

String s = "I love cookies cookies, apples and pancakes pancakes.";

StringTokenizer tokenizer = new StringTokenizer(s, " ,.", true);

List<String> duplicateRemovedTokenList = new LinkedList<>();

String prevToken = null;

while (tokenizer.hasMoreTokens()) {

    String currentToken = tokenizer.nextToken();

    if (currentToken.equals(" ")) {
        duplicateRemovedTokenList.add(currentToken);
        continue;
    }

    if (!currentToken.equals(prevToken)) {
        duplicateRemovedTokenList.add(currentToken);
        prevToken = currentToken;
    }
}

String duplicateRemovedString = StringUtils.join(duplicateRemovedTokenList, "");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM