Tonight I'm attempting to parse words from a file, and I'd like to remove all punctuation while preserving Lower and Upper case words as well as white spaces.
String alpha = word.replaceAll("[^a-zA-Z]", "");
This replaces everything, including white spaces.
Operating on a text file containing Testing, testing, 1, one, 2, two, 3, three.
, the output becomes TESTINGTESTINGONETWOTHREE
However, when I change it to
String alpha = word.replaceAll("[^a-zA-Z\\s]", "");
The output does not change.
Here's this code snippet in its entirety:
public class UpperCaseScanner {
public static void main(String[] args) throws FileNotFoundException {
//First, define the filepath the program will look for.
String filename = "file.txt"; //Filename
String targetFile = "";
String workingDir = System.getProperty("user.dir");
targetFile = workingDir + File.separator + filename; //Full filepath.
//System.out.println(targetFile); //Debug code, prints the filepath.
Scanner fileScan = new Scanner(new File(targetFile));
while(fileScan.hasNext()){
String word = fileScan.next();
//Replace non-alphabet characters with empty char.
String alpha = word.replaceAll("[^a-zA-Z\\s]", "");
System.out.print(alpha.toUpperCase());
}
fileScan.close();
}
}
file.txt has one line, reading Testing, testing, 1, one, 2, two, 3, three.
My goal is for the output to read Testing Testing One Two Three
Am I just doing something wrong in the regular expression, or is there something else I need to do? If it's relevant, I'm working in 32-bit Eclipse 2.0.2.2.
System.out.println(str.replaceAll("\\p{P}", "")); //Removes Special characters only
System.out.println(str.replaceAll("[^a-zA-Z]", "")); //Removes space, Special Characters and digits
System.out.println(str.replaceAll("[^a-zA-Z\\s]", "")); //Removes Special Characters and Digits
System.out.println(str.replaceAll("\\s+", "")); //Remove spaces only
System.out.println(str.replaceAll("\\p{Punct}", "")); //Removes Special characters only
System.out.println(str.replaceAll("\\W", "")); //Removes space, Special Characters but not digits
System.out.println(str.replaceAll("\\p{Punct}+", "")); //Removes Special characters only
System.out.println(str.replaceAll("\\p{Punct}|\\d", "")); //Removes Special Characters and Digits
I was able to get the output you were looking for using this. I wasn't sure if you required multiple spaces to be single space that is why I added the second call to replace all to convert multiple spaces to a single space.
public class RemovePunctuation {
public static void main(String[] args) {
String input = "Testing, testing, 1, one, 2, two, 3, three.";
String alpha = input.replaceAll("[^a-zA-Z\\s]", "").replaceAll("\\s+", " ");
System.out.println(alpha);
}
}
This methods outputs:
Testing testing one two three
If you wanted the first character of each word capitalized (like you showed in your question) then you could do this:
public class Foo {
public static void main(String[] args) {
String input = "Testing, testing, 1, one, 2, two, 3, three.";
String alpha = input.replaceAll("[^a-zA-Z\\s]", "").replaceAll("\\s+", " ");
System.out.println(alpha);
StringBuilder upperCaseWords = new StringBuilder();
String[] words = alpha.split("\\s");
for(String word : words) {
String upperCase = Character.toUpperCase(word.charAt(0)) + word.substring(1) + " ";
upperCaseWords.append(upperCase);
}
System.out.println(upperCaseWords.toString());
}
}
Which outputs:
Testing testing one two three Testing Testing One Two Three
i think that Java supports
\p{Punct}
which removes all punctuation characters
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.