Python从pos序列中搜索特定的单词序列并突出显示它

Question

Let's say we have a sentence like this, 假设我们有一个这样的句子，

 string = "He/PRP has/VBZ some/DT well/RB made/VBN clothes/NNS made/VBN by/IN a/DT Italian/JJ American/JJ tailor/NN in/IN the/DT Italian/JJ club/NN ./."

and I have a list of compound words to be highlighted. 我有一个要突出显示的复合词列表。

target = ['He', 'wellmade', 'ItalianAmerican']

and I want to get the result looks like below. 我想得到的结果如下图所示。

"[He/PRP] has/VBZ some/DT [well/RB made/VBN] clothes/NNS made/VBN by/IN a/DT [Italian/JJ American/JJ] tailor/NN in/IN the/DT Italian/JJ club/NN ./."

It is assumed that the length of each target item is the same or longer than the corresponding tokens in a sentence. 假定每个目标项目的长度与句子中的相应标记相同或更长 。 I think I should first spot the the span that corresponds to target items, and then insert the brackets, but I can't implement it into a code. 我认为我应该先找出与目标项目对应的跨度，然后插入方括号，但是我无法在代码中实现它。 Please give me some hint. 请给我一些提示。 thanks! 谢谢！

Answer 1

It is easy with 'He', problems begin with 'wellmade', as it is a compound word that is split in the input string, even with suffixes appended. “ He”很容易，问题始于“ wellmade”，因为它是一个复合词，即使在附加后缀的情况下，也会在输入字符串中进行拆分。 I'd suggest turning your target items into regex patterns with optional groups: (?:\\/[AZ]+\\s*|\\s)? 我建议您将target项目转换为带有可选组的正则表达式模式： (?:\\/[AZ]+\\s*|\\s)? should be inserted after each letter but the last, and (?:\\/[AZ]+)? 应该在每个字母（但最后一个字母）之后插入(?:\\/[AZ]+)? after the last letter. 最后一封信之后。

Have a look at a sample regex for ItalianAmerican : 看看ItalianAmerican正则表达式ItalianAmerican ：

Have a look at the demo example . 看一下演示示例。

Answer 2

Is this what you are looking for? 这是你想要的？

import re
re.sub(r'((?:He|well.*?made|Italian.*?American).*?)(\s)', r'[\1]\2', string)

Python从pos序列中搜索特定的单词序列并突出显示它

问题描述

2 个解决方案

解决方案1
0 2015-03-20 10:01:52

解决方案2
0 已采纳 2015-03-20 10:29:07

Python从pos序列中搜索特定的单词序列并突出显示它

问题描述

2 个解决方案

解决方案1 0 2015-03-20 10:01:52

解决方案2 0 已采纳 2015-03-20 10:29:07

解决方案1
0 2015-03-20 10:01:52

解决方案2
0 已采纳 2015-03-20 10:29:07