简体   繁体   English

如何在TCL中匹配整个单词? regexp“ \\\\ msub1 \\\\ M” sub1_ex

[英]how to match whole word in TCL? regexp “\\msub1\\M” sub1_ex

Please help me in solving this... 请帮我解决这个问题...

set var1 sub1
set var2 sub
set var3 sub1_ex

i want to match $var1 and $var3 and not $var2 ie 我想匹配$var1$var3而不是$var2

regexp $var1 $var3 should be 1 ; regexp $var1 $var3应该为1; regexp $var2 $var3 should be 0 ; regexp $var2 $var3应该为0; but getting 1. 但是得到1。

I also tried 我也试过

regexp "\\m$var1\\M" $var3

but got 0. 但得到0。

Okay, I think I finally managed to parse the question. 好吧,我想我终于设法解析了这个问题。

The first problem is that "sub" is a substring of "sub1", and they're both substrings of "sub1_ex". 第一个问题是“ sub”是“ sub1”的子字符串,它们都是“ sub1_ex”的子字符串。

The second problem is that "words" in terms of the regular expression engine are contigous blocks of adjacent characters matching class \\w which includes both alphanumerics and an underscore (see this ), so if you use \\m and \\M to anchor the pattern "sub1", the string "sub1_ex" would not match as there's no word boundary between "1" and "_". 第二个问题是,就正则表达式引擎而言,“单词”是匹配类\\w的相邻字符的连续块,该类同时包含字母数字和下划线(请参见this ),因此如果使用\\m\\M来锚定模式“ sub1”,字符串“ sub1_ex”将不匹配,因为“ 1”和“ _”之间没有单词边界。

What to try next does really depend on your use case and unfortunately I'm not quite getting it. 接下来要尝试的方法确实取决于您的用例,但是不幸的是我不太了解。 Changing var2 to read sub\\M would probably fix it but I'm not sure it's what you want. var2更改为读取sub\\M可能会解决该问题,但我不确定这是否是您想要的。

The problem you've got is that the word character class includes the underscore character, so the special “at start/end of word” patterns don't work for you. 您遇到的问题是单词字符类包含下划线字符,因此特殊的“单词开头/结尾”模式对您不起作用。

A partial solution is to use a more elaborate match: 部分解决方案是使用更精细的匹配:

regexp "\\m${var1}(?!\[a-zA-Z0-9\])" $var3

This works at the end of a word, but not at the start (the RE engine used in Tcl does not support any kind of lookbehind constraint). 这只适用于单词的结尾,而不适用于开头(Tcl中使用的RE引擎不支持任何形式的后向约束)。 Thus, it is actually simpler to transform the string being matched against: 因此,实际上更容易转换要匹配的字符串:

regexp "\\m$var1\\M" [string map {"_" " "} $var3]

That will work fine provided the string you're trying to find doesn't include an underscore. 如果您要查找的字符串不包含下划线,那将很好地工作。 I guess that's true in your case. 我想您的情况是对的。 If not, you have to use a real trick and insert some really rare character as the replacement: 如果不是,则必须使用一个真正的技巧,并插入一些非常稀有的字符作为替换:

set mapping {"_" "\ufffd"};   # Unicode replacement char!
regexp "\\m[string map $mapping $var1]\\M" [string map $mapping $var3]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM