简体   繁体   中英

Extract strings between two words that are supplied from two lists respectively

I have a text which looks like an email body as follows.

To: Abc Cohen <abc.cohen@email.com> Cc: <braggis.mathew@nomail.com>,<samanth.castillo@email.com> Hi 
Abc, I happened to see your report. I have not seen any abnormalities and thus I don't think we 
should proceed to Braggis. I am open to your thought as well. Regards, Abc On Tue 23 Jul 2017 07:22 

PM Tony Stark wrote:

Then I have a list of key words as follows.

no_wds = ["No","don't","Can't","Not"]
yes_wds = ["Proceed","Approve","May go ahead"]

Objective: I want to first search the text string as given above and if any of the key words as listed above is (or are) present then I want to extract the strings in between those key words. In this case, we have Not and don't keywords matched from no_wds . Also we have Proceed key word matched from yes_wds list. Thus the text I want to be extracted as list as follows

txt = ['seen any abnormalities and thus I don't think we should','think we should']

My approach:

I have tried

 re.findall(r'{}(.*){}'.format(re.escape('|'.join(no_wds)),re.escape('|'.join(yes_wds))),text,re.I)

Or

text_f = []
for i in no_wds:
  for j in yes_wds:
    t = re.findall(r'{}(.*){}'.format(re.escape(i),re.escape(j)),text, re.I)
    text_f.append(t)

Didn't get any suitable result. Then I tried str.find() method, there also no success.

I tried to get a clue from here .

Can anybody help in solving this? Any non-regex solution is somewhat I am keen to see, as regex at times are not a good fit. Having said the same, if any one can come up with regex based solution where I can iterate the lists it is welcome.

Loop through the list containing the keys, use the iterator as a splitter (whatever.split(yourIterator)).

EDIT:

I am not doing your homework, but this should get you on your way:

I decided to loop through the splitted at every space list of the message, search for the key words and add the index of the hits into a list, then I used those indexes to slice the message, probably worth trying to slice the message without splitting it, but I am not going to do your homework. And you must find a way to automate the process when there are more indexes, tip: check if the size is even or you are going to have a bad time slicing. *Note that you should replace the \n characters and find a way to sort the key lists.

message = """To: Abc Cohen <abc.cohen@email.com> Cc: <braggis.mathew@nomail.com>,<samanth.castillo@email.com> Hi 
Abc, I happened to see your report. I have not seen any abnormalities and thus I don't think we 
should proceed to Braggis. I am open to your thought as well. Regards, Abc On Tue 23 Jul 2017 07:22"""

no_wds = ["No","don't","Can't","Not"]
yes_wds = ["Proceed","Approve","May go ahead"]

splittedMessage = message.split( ' ' )
msg = []
for i in range( 0, len( splittedMessage ) ):
   temp = splittedMessage[i]
   for j, k in zip( no_wds, yes_wds ):
       tempJ = j.lower()
       tempK = k.lower()
       if( tempJ == temp or tempK == temp ):
           msg.append( i )

found = ' '.join( splittedMessage[msg[0]:msg[1]] )
print( found )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM