Python Search the specific word sequence from the pos sequence and highlight it

Question

Let's say we have a sentence like this,

 string = "He/PRP has/VBZ some/DT well/RB made/VBN clothes/NNS made/VBN by/IN a/DT Italian/JJ American/JJ tailor/NN in/IN the/DT Italian/JJ club/NN ./."

and I have a list of compound words to be highlighted.

target = ['He', 'wellmade', 'ItalianAmerican']

and I want to get the result looks like below.

"[He/PRP] has/VBZ some/DT [well/RB made/VBN] clothes/NNS made/VBN by/IN a/DT [Italian/JJ American/JJ] tailor/NN in/IN the/DT Italian/JJ club/NN ./."

It is assumed that the length of each target item is than the corresponding tokens in a sentence. 。 I think I should first spot the the span that corresponds to target items, and then insert the brackets, but I can't implement it into a code. Please give me some hint. thanks!

Answer 1

It is easy with 'He', problems begin with 'wellmade', as it is a compound word that is split in the input string, even with suffixes appended. I'd suggest turning your target items into regex patterns with optional groups: (?:\\/[AZ]+\\s*|\\s)? should be inserted after each letter but the last, and (?:\\/[AZ]+)? after the last letter.

Have a look at a sample regex for ItalianAmerican :

Have a look at the demo example .

Answer 2

Is this what you are looking for?

import re
re.sub(r'((?:He|well.*?made|Italian.*?American).*?)(\s)', r'[\1]\2', string)

Python Search the specific word sequence from the pos sequence and highlight it

Question

2 answers

solution1
0 2015-03-20 10:01:52

solution2
0 ACCPTED 2015-03-20 10:29:07

Python Search the specific word sequence from the pos sequence and highlight it

Question

2 answers

solution1 0 2015-03-20 10:01:52

solution2 0 ACCPTED 2015-03-20 10:29:07

solution1
0 2015-03-20 10:01:52

solution2
0 ACCPTED 2015-03-20 10:29:07