简体   繁体   English

在python中使用正则表达式提取实体

[英]Extract entity using regex in python

What I have: "peak hour traffic at location1 towards location2 , location3 towards location4 and location5 towards location6 ." 我所拥有的: “高峰时间在location1location2行驶location3location4行驶,location5location6行驶 。”

example: "peak hour traffic at ulsoor lake jn towards nagatheatre jn, okalipuram towards majestic and bamboo bazaar jn towards cole's park jn." 例如: “高峰时段在ulsoor湖jn前往nagatheatre jn,okalipuram前往雄伟,而竹市集jn前往科尔公园jn。”

What I want: extract locations using regex in python. 我想要的是:在python中使用正则表达式提取位置

example: 例:

[('ulsoor lake jn', 'nagatheatre jn'), ('okalipuram', 'majestic'), ('bamboo bazaar jn', "cole's park jn")] [('ulsoor lake jn','nagatheatre jn'),('okalipuram','majestic'),('bamboo bazaar jn',“ cole's park jn”)]

what I have done: 我做了什么:

>>> regex1 = '(?:\sat\s|,|and)(.*) towards (.*)(?:\.|,|and)'
>>> re.search(regex1, "peak hour traffic at ulsoor lake jn towards nagatheatre jn, okalipuram towards majestic and bamboo bazaar jn towards cole's park jn.").groups()
15: ('ulsoor lake jn towards nagatheatre jn, okalipuram towards majestic and bamboo bazaar jn',
"cole's park jn")

What I am getting : 我得到的是

('ulsoor lake jn towards nagatheatre jn, okalipuram towards majestic and bamboo bazaar jn', "cole's park jn") (“朝向nagatheatre jn的ulsoor湖jn,通往雄伟的竹市集jn的okalipuram”,“科尔公园jn”)

As it can be seen it's only matching the external expression when there are sub-expression that matches the pattern. 可以看出,仅当存在与模式匹配的子表达式时,它才与外部表达式匹配。 Please help. 请帮忙。 Thank You. 谢谢。

You actually need a couple of things. 您实际上需要做两件事。 First - like my comment said - use (.*?) instead of (.*) so that your captures are not greedy. 首先-就像我的评论所说-使用(.*?)而不是(.*)这样您的捕获不会变得贪婪。

Second - use a look-ahead assertion so you don't advance the parser when determing where a capture ends, 第二点-使用前瞻性断言,这样在确定捕获在哪里结束时就不会前进解析器,

Third - use findall instead of search . 第三,使用findall代替search

>>> r = re.compile('(?:\sat\s|,|and)(.*?) towards (.*?)(?=\.|,|and)')
>>> s = "peak hour traffic at ulsoor lake jn towards nagatheatre jn, okalipuram towards majestic and bamboo bazaar jn to wards cole's park jn."
>>>
>>> r.findall(s)
[('ulsoor lake jn', 'nagatheatre jn'), (' okalipuram', 'majestic '), (' bamboo bazaar jn', "cole's park jn")]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM