简体   繁体   English

如何在python中使用正则表达式从字符串中提取特定单词

[英]how to extract specific word from string using regex in python

I have two string contain word with their type's: 我有两个包含单词及其类型的字符串:

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'

I like to extract any word form word with /NN tag to word with /NNP and /CDP tag. 我喜欢将带有/NN标签的任何单词形式的单词提取到带有/NNP/CDP标签的单词中。 Here is my code so far (still only work with /NNP tag): 到目前为止,这是我的代码(仍然仅适用于/NNP标签):

import re

def entityExtractPreposition(text):
    text = re.findall(r'([^\s/]*/IN\b[^/]*(?:/(?!IN\b)[^/]*)*/NNP\b)', text)
    return text

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
prepo1 = entityExtractPreposition(text1)

text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'
prepo2 = entityExtractPreposition(text2)

print text1
print prepo1
print ''
print text2
print prepo2

The result of the code so far: 到目前为止的代码结果:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP']

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']

As we see for the first string ( text1 ) the entityExtractPreposition still fail to get 33/CDP . 如我们所见,对于第一个字符串( text1 ), entityExtractPreposition仍然无法获得33/CDP How to make the entityExtractPreposition work fine either with /CDP tag in text1 or /NNP in text2? 如何通过text1中的/CDP标记或text2中的/NNP来使entityExtractPreposition工作正常?

The expected result is: 预期结果是:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP 33/CDP']

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']

Thanks 谢谢

\b[^\s/]+/IN\b(?:(?!/IN\b).)*/(?:NNP|CDP)\b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用python正则表达式从字符串中提取单词 - Extract word from string Using python regex 使用Python从字符串中提取特定单词 - Extract specific word from the string using Python Python正则表达式从多行字符串中提取单词 - Python Regex to Extract Word From Multiline String 如何使用python(正则表达式)从字符串中提取像d1234-5678-c9876这样的单词? - how to extract word like d1234-5678-c9876 from a string using python (regex expression)? 在Python中使用正则表达式从字符串中提取具有特定字符的单词列表 - Extract list of words with specific character from string using regex in Python 使用 Python 使用正则表达式仅从字符串中提取特定值 - Extract only the specific value from string with Regex Using Python 如何使用regex从Python的Word文档中提取问题 - How to extract questions from a word doc with Python using regex Python 正则表达式 - 如何使用 python 正则表达式获取字符串中特定单词后的单词? - Python Regex - How do I fetch a word after a specific word in a string using python regex? 在python中使用正则表达式单词边界提取单词形式字符串 - Extract word form string using regex word boundaries in python 从字符串,正则表达式,python中提取特定格式 - Extract specific format from a string, regex, python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM