查找单词的最后一次出现

Question

I have the following string: 我有以下字符串：

<SEM>electric</SEM> cu <SEM>hello</SEM> rent <SEM>is<I>love</I>, <PARTITION />mind

I want to find the last "SEM" start tag before the "PARTITION" tag. 我想在“ PARTITION”标签之前找到最后一个“ SEM”开始标签。 not the SEM end tag but the start tag. 不是SEM结束标签，而是开始标签。 The result should be: 结果应为：

<SEM>is <Im>love</Im>, <PARTITION />

I have tried this regular expression: 我试过这个正则表达式：

<SEM>[^<]*<PARTITION[ ]/>

but it only works if the final "SEM" and "PARTITION" tags do not have any other tag between them. 但是只有在最后的“ SEM”和“ PARTITION”标签之间没有其他标签时，它才有效。 Any ideas? 有任何想法吗？

Answer 1

Use String.IndexOf to find PARTITION and String.LastIndexOf to find SEM? 使用String.IndexOf查找PARTITION，使用String.LastIndexOf查找SEM？

int partitionIndex = text.IndexOf("<PARTITION");
int emIndex = text.LastIndexOf("<SEM>", partitionIndex);

Answer 2

And here's your goofy Regex!!! 这是您愚蠢的Regex ！！！

(?=[\s\S]*?\<PARTITION)(?![\s\S]+?\<SEM\>)\<SEM\>

What that says is "While ahead somewhere is a PARTITION tag... but while ahead is NOT another SEM tag... match a SEM tag." 这就是说：“在某处的前面是一个PARTITION标签……但是在前面的不是另一个SEM标签……与SEM标签匹配。”

Enjoy! 请享用！

Here's that regex broken down: 这是正则表达式分解：

(?=[\s\S]*?\<PARTITION) means "While ahead somewhere is a PARTITION tag"
(?![\s\S]+?\<SEM\>) means "While ahead somewhere is not a SEM tag"
\<SEM\> means "Match a SEM tag"

Answer 3

如果要使用正则表达式查找某些内容的最后出现，那么您可能还希望使用从右至左的解析正则表达式选项：

new Regex("...", RegexOptions.RightToLeft);

Answer 4

The solution is this, i have tested in http://regexlib.com/RETester.aspx 解决方案是这样，我已经在http://regexlib.com/RETester.aspx中进行了测试

<\s*SEM\s*>(?!.*</SEM>.*).*<\s*PARTITION\s*/>

As you want the last one, the only way to identify is to find only the characters that don't contain </SEM> . 正如您想要的最后一个一样，唯一的识别方法是仅查找不包含</SEM>的字符。

I have included "\\s*" in case there are some spaces in <SEM> or <PARTITION/> . 如果<SEM> or <PARTITION/>有一些空格，我会添加“ \\ s *”。

Basically, what we do is exclude the word </SEM> with: 基本上，我们要做的是将</SEM>排除在外：

(?!.*</SEM>.*)

Answer 5

Have you tried this: 您是否尝试过：

<EM>.*<PARTITION\s*/>

Your regular expression was matching anything but "<" after the "EM" tag. 您的正则表达式在“ EM”标记后匹配除“ <”以外的任何内容。 Therefore it would stop matching when it hit the closing "EM" tag. 因此，当它碰到关闭的“ EM”标签时，它将停止匹配。

Answer 6

Bit quick-and-dirty, but try this: 有点麻烦，但请尝试以下操作：

(<SEM>.*?</SEM>.*?)*(<SEM>.*?<PARTITION)

and take a look at what's in the C#/.net equivalent of $2 看看C＃/。net中相当于$ 2的内容

The secret lies in the lazy-matching construct (.*?) --- I assume/hope C# supports this. 秘密在于延迟匹配的构造（。*？）---我认为/希望C＃支持这一点。

Clearly, Jon Skeet's solution will perform better, but you may want to use a regex (to simplify breaking up the bits that interest you, for example). 显然，乔恩·斯基特（Jon Skeet）的解决方案性能会更好，但是您可能要使用正则表达式（例如，以简化分解您感兴趣的部分）。

(Disclaimer: I'm a Perl/Python/Ruby person myself...) （免责声明：我本人是Perl / Python / Ruby人...）

查找单词的最后一次出现

问题描述

6 个解决方案

解决方案1
7 2008-11-25 10:00:51

解决方案2
3 已采纳 2008-11-25 11:36:11

解决方案3
2 2008-11-26 02:26:16

解决方案4
1 2008-11-25 12:32:32

解决方案5
0 2008-11-25 09:59:32

解决方案6
0 2008-11-25 10:26:11

查找单词的最后一次出现

问题描述

6 个解决方案

解决方案1 7 2008-11-25 10:00:51

解决方案2 3 已采纳 2008-11-25 11:36:11

解决方案3 2 2008-11-26 02:26:16

解决方案4 1 2008-11-25 12:32:32

解决方案5 0 2008-11-25 09:59:32

解决方案6 0 2008-11-25 10:26:11

解决方案1
7 2008-11-25 10:00:51

解决方案2
3 已采纳 2008-11-25 11:36:11

解决方案3
2 2008-11-26 02:26:16

解决方案4
1 2008-11-25 12:32:32

解决方案5
0 2008-11-25 09:59:32

解决方案6
0 2008-11-25 10:26:11