繁体   English   中英

如何在python中使用正则表达式从字符串中提取特定单词

[英]how to extract specific word from string using regex in python

我有两个包含单词及其类型的字符串:

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'

我喜欢将带有/NN标签的任何单词形式的单词提取到带有/NNP/CDP标签的单词中。 到目前为止,这是我的代码(仍然仅适用于/NNP标签):

import re

def entityExtractPreposition(text):
    text = re.findall(r'([^\s/]*/IN\b[^/]*(?:/(?!IN\b)[^/]*)*/NNP\b)', text)
    return text

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
prepo1 = entityExtractPreposition(text1)

text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'
prepo2 = entityExtractPreposition(text2)

print text1
print prepo1
print ''
print text2
print prepo2

到目前为止的代码结果:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP']

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']

如我们所见,对于第一个字符串( text1 ), entityExtractPreposition仍然无法获得33/CDP 如何通过text1中的/CDP标记或text2中的/NNP来使entityExtractPreposition工作正常?

预期结果是:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP 33/CDP']

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']

谢谢

\b[^\s/]+/IN\b(?:(?!/IN\b).)*/(?:NNP|CDP)\b

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM