简体   繁体   中英

What is the best way to break up a text file in java given certain limitations on what a valid string is?

I have been reading a lot of questions and answers about using delimiters and patterns but still having a lot of trouble figuring this one out.I want to read a text file that may or may not be jumbled up and pick the words out of it. So with input such as this

"the.dog,jumped over the hole@bob's house"

This would give me the following words

[the, dog, jumped, over, the, hole, bob's, house]

I would then do something with each word.

Scanner s1 = new Scanner(fileName);
while(s1.hasNext()){
temp = s1.next(String pattern = "no clue");
    //do something with temp
}

I feel like a pattern would be the best way to do it but how would I make a pattern that includes any variation chars long as it starts with a letter and ends when it reaches any of these characters? . , * % " ( ) & $ ? < > ! - : ; @ # . , * % " ( ) & $ ? < > ! - : ; @ # or any type of white space .

I know I could do it a very ugly way with very crappy run time efficiency. Any help will be greatly appreciated or direction to another question that maybe of help that I haven't found.

Something like the following should work:

Scanner s1 = new Scanner(fileName).useDelimiter("[^\\p{L}']+");
while(s1.hasNext()) {
    String temp = s1.next();
    System.out.println(temp);
}

I think all what you need is to specify all your delimiters in the function scanner.useDelimiter , here is an example that splits your testing sentence as you specified (using . , @ space as delimiter). You can add more delimiters as you want in the pattern expression.

Scanner scanner = new Scanner("the.dog,jumped over the hole@bob's house");
scanner.useDelimiter("\\.|\\,|\\@|\\s");

while (scanner.hasNext()) {
    String temp = scanner.next();
    System.out.println(temp);
}

If you want to ignore repeated delimiters eg "the....dog,,,jumped" you can use the following pattern as delimiter scanner.useDelimiter("\\\\.+|\\\\,+|\\\\@+|\\\\s+"); , which only adds + after the delimiter

You can set delimiter on the scanner, and that should do the job for you.

Scanner s = new Scanner("the.dog,jumped over. the hole@bob's house.in land");
String pattern = "\\s|\\.|,|@" ;
s.useDelimiter(pattern);
while(s.hasNext()){
  String temp = s.next();
  //do something with temp
}

You can add all your delimiter in pattern string. You should escape (using \\\\) character which have special meaning in regex like .(dot), for details list of such character please refer this link

Keep it simple:

String[] a = "the.dog,jumped over. the hole@bob's house.in land".split("\\s|\\.|,|@");
for(int i=0; i< a.length;i++){
 String temp = a[i];
  //do something with temp
}

split() accepts regexps... use it...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM