简体   繁体   English

Python从pos序列中搜索特定的单词序列并突出显示它

[英]Python Search the specific word sequence from the pos sequence and highlight it

Let's say we have a sentence like this, 假设我们有一个这样的句子,

 string = "He/PRP has/VBZ some/DT well/RB made/VBN clothes/NNS made/VBN by/IN a/DT Italian/JJ American/JJ tailor/NN in/IN the/DT Italian/JJ club/NN ./."

and I have a list of compound words to be highlighted. 我有一个要突出显示的复合词列表。

target = ['He', 'wellmade', 'ItalianAmerican']

and I want to get the result looks like below. 我想得到的结果如下图所示。

"[He/PRP] has/VBZ some/DT [well/RB made/VBN] clothes/NNS made/VBN by/IN a/DT [Italian/JJ American/JJ] tailor/NN in/IN the/DT Italian/JJ club/NN ./."

It is assumed that the length of each target item is the same or longer than the corresponding tokens in a sentence. 假定每个目标项目的长度与句子中的相应标记相同或更长 I think I should first spot the the span that corresponds to target items, and then insert the brackets, but I can't implement it into a code. 我认为我应该先找出与目标项目对应的跨度,然后插入方括号,但是我无法在代码中实现它。 Please give me some hint. 请给我一些提示。 thanks! 谢谢!

It is easy with 'He', problems begin with 'wellmade', as it is a compound word that is split in the input string, even with suffixes appended. “ He”很容易,问题始于“ wellmade”,因为它是一个复合词,即使在附加后缀的情况下,也会在输入字符串中进行拆分。 I'd suggest turning your target items into regex patterns with optional groups: (?:\\/[AZ]+\\s*|\\s)? 我建议您将target项目转换为带有可选组的正则表达式模式: (?:\\/[AZ]+\\s*|\\s)? should be inserted after each letter but the last, and (?:\\/[AZ]+)? 应该在每个字母(但最后一个字母)之后插入(?:\\/[AZ]+)? after the last letter. 最后一封信之后。

Have a look at a sample regex for ItalianAmerican : 看看ItalianAmerican正则表达式ItalianAmerican

I(?:\\/[AZ]+\\s*|\\s)?t(?:\\/[AZ]+\\s*|\\s)?a(?:\\/[AZ]+\\s*|\\s)?l(?:\\/[AZ]+\\s*|\\s)?i(?:\\/[AZ]+\\s*|\\s)?a(?:\\/[AZ]+\\s*|\\s)?n(?:\\/[AZ]+\\s*|\\s)?A(?:\\/[AZ]+\\s*|\\s)?m(?:\\/[AZ]+\\s*|\\s)?e(?:\\/[AZ]+\\s*|\\s)?r(?:\\/[AZ]+\\s*|\\s)?i(?:\\/[AZ]+\\s*|\\s)?c(?:\\/[AZ]+\\s*|\\s)?a(?:\\/[AZ]+\\s*|\\s)?n(?:\\/[AZ]+)?

Have a look at the demo example . 看一下演示示例

Is this what you are looking for? 这是你想要的?

import re
re.sub(r'((?:He|well.*?made|Italian.*?American).*?)(\s)', r'[\1]\2', string)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM