正则表达式以识别单词之间的分隔符

Question

I am trying to separate words in a text. 我正在尝试将文本中的单词分开。 I need to split them by anything between them so I wrote a regular expression that works almost as it should. 我需要用它们之间的任何东西对它们进行拆分，因此我编写了一个几乎可以正常工作的正则表达式。

Words are alphabetic strings that can contain dashes (-), they cannot start with dashes or end with dashes. 单词是字母字符串，可以包含破折号（-），它们不能以破折号开头或以破折号结尾。 Words cannot contain numerals or any other character besides single dashes and [a-zA-Z]. 单词除单破折号和[a-zA-Z]外，不能包含数字或任何其他字符。

This is what I came up with so far: 到目前为止，这是我想出的：

/(-[^a-zA-Z])|\w*\d\w*|[^a-zA-Z-]+/ig

This, however, does not work correctly for words starting with a dash, such as this situation: 但是，这种方法不适用于以破折号开头的单词，例如：

123-word

That should match 那应该匹配

123-

Any help on this would be greatly appreciated, thanks! 任何帮助，将不胜感激，谢谢！

Update 更新资料

Sorry, I was rather vague. 抱歉，我有点模糊。 I need to match what is between words, not the words themselves, so I can do a split into an array further on. 我需要匹配单词之间的含义，而不是单词本身，因此我可以进一步拆分成数组。

This is what matches so far with the expression above: 到目前为止，这与上面的表达式匹配： 在此处输入图片说明

... and this is how it should be like: ...这应该是这样的： 在此处输入图片说明

Notice the difference of matching at the second text line (123-) Sorry for not being specific enough. 请注意第二个文本行（123-）的匹配差异对不起，因为不够具体。

Answer 1

You can use this regex: 您可以使用此正则表达式：

/(?<=[^\w-]|^)(?!-)([a-z-]+)(?<!-)(?=[^\w-]|$)/gi

Given an input as follows: 输入如下：

abc-def word A -notword xyz notword-

The above regex will match following words: 上面的正则表达式将匹配以下单词：

abc-def
word
A
xyz

Working demo 工作演示

UPDATE: Based on edited question you can use this regex for splitting: 更新：根据已编辑的问题，您可以使用此正则表达式进行拆分：

/([^\w-].*?)(?=(?<=[^\w-]|^)(?!-)[a-z-]+(?<!-)(?=[^\w-]|$))/gis

Working demo 工作演示

Answer 2

If I understood your question correctly. 如果我正确理解您的问题。

Instead of searching for the valid matches, what you want, I replaced all invalid matches. 我想要的不是取代有效的匹配，而是替换了所有无效的匹配。

Have a look at this Demo It is matching all invalid matches according to your question, what I have understood. 看看这个演示，它根据您的问题匹配所有无效匹配，据我所知。

"Words are alphabetic strings that can contain dashes (-), they cannot start with dashes or end with dashes. Words cannot contain numerals or any other character besides single dashes and [a-zA-Z]." “单词是字母字符串，可以包含破折号（-），它们不能以破折号开头或以破折号结尾。单词不能包含数字或除单个破折号和[a-zA-Z]之外的任何其他字符。”

This is the Code 这是代码

var str = 'word word-ed, [word-ing] 123-word w-word, word-. w0rd w14rd 124eword 1234word finished.'
str.replace(/(\b[\d]+-[a-zA-Z]+\b)|(\b[\d]+[a-zA-Z]+)|(\b[a-zA-Z]+[\d]+[a-zA-Z]+)|(\b[a-zA-Z]+-[.,]|([\[\],.]))/g, '').split(/\s+/)

Output 输出量

["word", "word-ed", "word-ing", "w-word", "finished"]

Explanation: 说明：

Search for Invalid matches 搜索无效的匹配项

str.match(/(\b[\d]+-[a-zA-Z]+\b)|(\b[\d]+[a-zA-Z]+)|(\b[a-zA-Z]+[\d]+[a-zA-Z]+)|(\b[a-zA-Z]+-[.,]|([\[\],.]))/g)
//output
[",", "[", "]", "123-word", ",", "word-.", "w0rd", "w14rd", "124eword", "1234word", "."]

Replace with null 替换为null

var temp = str.replace(/(\b[\d]+-[a-zA-Z]+\b)|(\b[\d]+[a-zA-Z]+)|(\b[a-zA-Z]+[\d]+[a-zA-Z]+)|(\b[a-zA-Z]+-[.,]|([\[\],.]))/g)
//output
"word word-ed word-ing  w-word      finished"

split the result with spaces 用空格分割结果

temp.split(/\s+/)
//output
["word", "word-ed", "word-ing", "w-word", "finished"]

正则表达式以识别单词之间的分隔符

问题描述

2 个解决方案

解决方案1
0 2014-06-18 14:00:30

解决方案2
0 已采纳 2014-06-18 15:22:00

正则表达式以识别单词之间的分隔符

问题描述

2 个解决方案

解决方案1 0 2014-06-18 14:00:30

解决方案2 0 已采纳 2014-06-18 15:22:00

解决方案1
0 2014-06-18 14:00:30

解决方案2
0 已采纳 2014-06-18 15:22:00