简体   繁体   English

连字符混合词的正则表达式

[英]Regular expression for hyphens mixed words

I can use string.split("\\\\W+") to have words containing only characters. 我可以使用string.split("\\\\W+")使单词仅包含字符。

However: 然而:

  1. I don't want break down words such as "re-use" into "re" & "use" . 我不想将诸如“ re-use”之类的单词分解为“ re”“ use”
    And also words like "out-of-the-way" with multiple hyphens. 还有带有多个连字符的偏僻”之类的词。

  2. I want to break "and--oh" into "and" & "oh" . 我想将“ and--oh”分解为“ and”“ oh”

How can I possibly achieve that? 我怎么可能实现呢?

试试这个正则表达式:

string.split("[^\\w\\-]+|--+")

You can replace continuous hyphens to a special character firstly, and then do the simple regex split. 您可以先将连续的连字符替换为特殊字符,然后再进行简单的正则表达式拆分。

Please refer to the code below. 请参考下面的代码。

public class Test {
    public static void main(String args[]){
        String str = "This is^^some@@words-apple-banana--orange";
        str = str.replaceAll("[-]{2,}", "@");
        System.out.println(str);
        String regex = "[^\\w-]+";
        String arr[] = str.split(regex);
        for(String item:arr){
            System.out.println(item);
        }
    }
}

The result is: 结果是:

This are^^some@@words-apple-banana@orange
This
are
some
words-apple-banana
orange

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM