简体   繁体   中英

check if word contains a number or special character

I am writing a program to count the total number of valid English words in a text file. In this code, I want to ignore words that contain number/numbers or special characters eg "word123", "123word ", "word&&", "$name". Currently my program detects words that start with numbers eg "123number". However cannot detect "number123". Can anyone tell me how should I move forward ? Below is my code:

public int wordCounter(String filePath) throws FileNotFoundException{
    File f = new File(filePath);
    Scanner scanner = new Scanner(f);
    int nonWord = 0;
    int count = 0;
    String regex = "[a-zA-Z].*";

    while(scanner.hasNext()){
        String word = scanner.next();
        if(word.matches(regex)){
            count++;
    }
        else{
            nonWord++;
        }
    }
    return count;
}

Lose the dot:

String regex = "[a-zA-Z]*"; // more correctly "[a-zA-Z]+", but both will work here

The dot means "any character", but you want a regex that means "only composed of letters".

BTW, you can also express this more succinctly (although arguably less readably) using a POSIX expression:

String regex = "\\p{L}+";

The regex \\p{L} means "any letter".


To extend the expression to include the apostrophe, which can appear at the start, eg 'tis , the middle eg can't or the end eg Jesus' , but not more than once:

String regex = "(?!([^']*'){2})['\\p{L}]+";

Use regex ^[a-zA-Z-]+$ for word match.

public int wordCounter(String filePath) throws FileNotFoundException
{
File f = new File(filePath);
Scanner scanner = new Scanner(f);
int nonWord = 0;
int count = 0;
String regex = "^[a-zA-Z-]+$";

while(scanner.hasNext()){
    String word = scanner.next();
    if(word.matches(regex)){
        count++;
}
    else{
        nonWord++;
    }
}
return count;

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM