简体   繁体   中英

Having problems splitting natural language phrase in RegEx

I'm attempting to write a RegEx pattern that will pull out key phrases of a natural language phrase in order to build a query and return the data. Everything has been going smooth so far until I've run into an issue trying to efficently pull the main subject from the sentance. For example:

Lets assume my phrase is "Show me all tickets that were closed last month". I can parse each element needed to build the query however if I attempt something like "show me all tickets and requests that were closed last week" and it all comes crashing down.

I'm having difficulty getting both subjects (tickets and requests). Ideally they would be brought into seperate named groups such as Measures:tickets, requests and logic: and . To note, some measures may contain spaces so that must be accounted for as well.

I've only been able to come up with this so far:

(\S+\s?)+(?=and|or)

which when using a test phrase of "#sla met and tickets" it will only pull #sla met.

I've only started working with regular expressions since yesterday so any tips would be most helpful!

A quick answer that addresses only one very narrow part of the problem :

(.+)((and|or)(.+))

This will grab any number of terms concatenated with and or or . It will not capture each term separately for you, but you could split the results on and and or . Of course, you could get the same results using .+ .

Do you see the problem? Regular expressions are not going to allow you to parse natural language . You're attempting to tunnel through a mountain using a spoon. I actually had to delete and recreate my answer because I spent five minutes just trying to get the capturing working and eventually gave up. That's how insufficient regex is for this task.

If you truly want to work on parsing natural language, you need to start reading research papers. A lot of them.

Edit : Here is a regex that will find multiple matches (NOT a single match with multiple groups), each match having a single capture group that is the item.

(?:\s+(?:and|or)\s+)?(\S+)

Disclaimer: There are many ways to fool this regex. I can think of three or four right now, but there are certainly more than that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM