简体   繁体   English

如何使用定界符隔离单词(Java)

[英]How to use delimiter to isolate words (Java)

I am writing a program that scans text files and then writes each word into a Hashmap. 我正在编写一个程序,该程序扫描文本文件,然后将每个单词写入Hashmap。

The Scanner class has a defualt delimiter of space. Scanner类具有默认的空间分隔符。 But I ended up having my words stored with punctuations attached to them. 但是我最终将自己的单词存储在标点符号上。 I want the scanner to recognize periods, comas and other types of common punctuations as a sign to stop the token. 我希望扫描仪将句点,昏迷和其他类型的常见标点符号识别为停止令牌的标志。 Here's what I have attempted: 这是我尝试过的:

    Scanner line_scanner = new Scanner(line).useDelimiter("[.,:;()?!\" \t]+~\\s");

The scanner basically ignored all the spaces even though I have '\\\\s' as part of the expression. 扫描程序基本上忽略了所有空格,即使我在表达式中使用了“ \\\\ s”也是如此。 Sorry, but I have hardly any understanding of regex. 抱歉,但是我对正则表达式几乎一无所知。

 Scanner line_scanner = new Scanner(line).useDelimiter("[.,:;()?!\"\\s]+");

You might go for no unicode letters: 您可能不需要任何unicode字母:

useDelimiter("[^\\p{L}\\p{M}]+");

([^...] is not, Capital p means Unicode category, L are the letters, M the diacritical combining marks (accents).) (不是[^ ...],大写p表示Unicode类别,L是字母,M是变音组合标记(带重音符号)。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM