简体   繁体   English

正则表达式选择超过 4 个字符的单词,但如果重复则只选择一个实例

[英]Regex select words longer than 4 characters but only one instance if duplicates

I am trying to format text in InDesign using GREP Style.我正在尝试使用 GREP 样式在 InDesign 中格式化文本。 The goal is to select words longer then 4 letters in a paragraph but if the word has been duplicated in a paragraph it should not select more then first instance of this word.目标是在段落中选择超过 4 个字母的单词,但如果该单词在段落中重复,则不应选择比该单词的第一个实例更多的单词。 This is sample text:这是示例文本:

"The Lord's right hand is lifted high; the Lord's right hand has done mighty things!" “主右手高举,主右手作大事!” The solution should give解决方案应该给出

  • Lord right hand lifted high done mighty things主右手举起高大伟业

i have done the first part我已经完成了第一部分

[[:word:]]{4,}

but don't have a clue how to deal with those duplicates.但不知道如何处理这些重复项。

Is order a requirement?订单有要求吗? If not, how about words longer than 4 characters not followed by that same word later in the text?如果不是,那么超过 4 个字符的单词后面没有跟同一个单词怎么办? See:看:

([[:word:]]{4,})(?!.*\1)

https://regex101.com/r/Ug4dLZ/1 https://regex101.com/r/Ug4dLZ/1

Result: lifted high Lord right hand done many things结果:举起高主右手做了很多事

To be more comprehensive, include word breaks (ie count "Person" and "Personhood" as 2 separate words):为了更全面,包括断词(即,将“Person”和“Personhood”计为 2 个单独的词):

([[:word:]]{4,})(?!.*\b\1\b)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式仅选择字符串的第一个实例(无重复) - regex to select only first instance of string (no duplicates) PHP - 检查字符串是否包含超过 4 个字符的词,然后包含“+ *”,对于那些短于 4 个字符的仅包含“*” - PHP - Check if string contains words longer than 4 characters, then include "+ *", and for those shorter than 4 characters include only "*" 正则表达式突出显示长度超过n个单词的句子 - regex to highlight sentences longer than n words 正则表达式只允许两个字,一个空格,最多50个字符 - Regex to allow only two words, one space and limit to 50 characters 正则表达式仅匹配某些字符或字符串,并且每个匹配仅一个实例? - Regex to match only certain characters or strings and only one instance of each? 正则表达式搜索以下任意多个字符“ +”,“-”,“。”,“%”,“ /”,“ *”的一个实例 - Regex search for more than one instance of any of these characters “+”, “-”, “.”, “%”, “/”, “*” 尝试对超过 22 个字符的字符串进行正则表达式 - Trying to regex for strings longer than 22 characters PHP正则表达式:删除少于3个字符的单词 - PHP Regex: Remove words less than 3 characters 正则表达式仅匹配所有给定的重复字符 - Regex match only all given characters with duplicates 在长度超过 n 个字符的单词之间包含一个空格 - Include a space between words longer than n characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM