简体   繁体   中英

Given a string remove all the special characters except hyphen and count number of words

Given a string "this is high-tech job market in which? we make. careers" I have to remove all special characters except hyphen and count number of words in a string so output should be 10 in this case. I have written below program but it did not pass the test cases.

public int countWords(String str) {
    if(str.isEmpty() || str==null)
       return 0;
    String replacedString = str.replaceAll(["^a-zA-Z0-9- ]","");
    String[] arrWords = replacedString.split("\\s+");
    return arrWords.length;
}

You can use the regex, [\p{Punct}&&[^-]] where \p{Punct} stands for a punctuation. If you want to replace everything other than alphabets, digits, hyphen and space, you can use the regex, [^\p{Alnum}\s-] where \p{Alnum} stands for an alphanumeric character .

Demo:

import java.util.Arrays;

public class Main {
    public static void main(String[] args) {
        String str = "this is high-tech job market in which? we make. careers";

        String[] arr = str.replaceAll("[\\p{Punct}&&[^-]]", "").split("\\s+");

        System.out.println(Arrays.toString(arr));

        int count = arr.length;

        System.out.println(count);
    }
}

Output:

[this, is, high-tech, job, market, in, which, we, make, careers]
10

First, the null / empty condition should be in the opposite order:

    if(str==null || str.isEmpty())

Do you understand why? (Hint: Java evaluation is lazy)

Additionally, does a "-" (minus) should be removed or not?

You can do this to count 10 words from your code. You replace your string if it's a non-word ( \W ) and you make an exception for hyphen.

public class Test {

    public static void main(String[] args) {
        String myString = "this is high-tech job market in which? we make. careers";
        myString = myString.replaceAll("[\\W&&[^\\-]]", " ");
        String[] arrWords = myString.split("\\s+");
        System.out.println(arrWords.length);
    }

}

The advantage of using \W is that it includes all unicode punctuation.

For exemple if you have these characters ' „ , \p{Punct} won't work.

Here is an alternative using Pattern.UNICODE_CHARACTER_CLASS if you need it:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {
        String myString = "this is high-tech job market ‘ „ in which? we make. careers";
        String[] arrWords2 = Pattern.compile("[\\p{Punct}&&[^-]]|\\s", Pattern.UNICODE_CHARACTER_CLASS).split(myString);
        List<String> arrayList = new ArrayList<String>(Arrays.asList(arrWords2));
        arrayList.removeAll(Arrays.asList("",null));
        System.out.println(arrayList);
        System.out.println(arrayList.size());
    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM