简体   繁体   中英

how to make regex go line by line to match two strings at the same time?

The question is worded a bit weird, but I didn't know how else to ask it.

I am using wordnet to pull some definitions and I need to have regex both pull the part of speech and the definition from the output which goes like this... if I looked up the word study

Overview of verb study

1. reading, blah, blah (to read a book with the intent of learning)
2. blah blah blah (second definition of study)

Overview of noun study

1. blah blah blah (the object of ones study)
2. yadda yadda yadda (second definition of study)

I want to get this returned...

[('verb', 'to read a book with the intent of learning'), ('verb', 'second definition of study'), ('noun', 'the object of ones studying'), ('noun','second definition of study')]

I have the two regex expressions that match what I want, but I can't figure out how to go through the data in order to get the data structure I want in the end. Any ideas?

EDIT:

adding regex patterns

stripped_defs = re.findall('^\s*\d+\..*\(([^)"]+)', definitions, re.M)
pos = re.findall('Overview of (\w+)', definitions)

My way is ( text is the text):

  1. split them by the Overview of... :

     >>> re.split('Overview of (\\w+) study', text)[1:] ['verb', '\\n\\n1. reading, blah, blah (to read a book with the intent of learning)\\n2. blah blah blah (second definition of study)\\n\\n', 'noun', '\\n\\n1. blah blah blah (the object of ones study)\\n2. yadda yadda yadda (second definition of study)'] >>> l = re.split('Overview of (\\w+) study', text)[1:] 
  2. split that list like this:

     >>> [l[i:i+2] for i in range(0, len(l), 2)] [['verb', '\\n\\n1. reading, blah, blah (to read a book with the intent of learning)\\n2. blah blah blah (second definition of study)\\n\\n'], ['noun', '\\n\\n1. blah blah blah (the object of ones study)\\n2. yadda yadda yadda (second definition of study)']] >>> l = [l[i:i+2] for i in range(0, len(l), 2)] 

Then we can simply do:

>>> [[(i, k) for k in re.findall('\((.+?)\)', j)] for i, j in l]
[[('verb', 'to read a book with the intent of learning'),
  ('verb', 'second definition of study')],

 [('noun', 'the object of ones study'),
  ('noun', 'second definition of study')]]

To get your expect output:

final_list = []
for i in [[(i, k) for k in re.findall('\(.+?\)', j)] for i, j in l]:
    final_list.extend(i)

print(final_list)

Which gives:

[('verb', 'to read a book with the intent of learning'),
 ('verb', 'second definition of study'),

 ('noun', 'the object of ones study'),
 ('noun', 'second definition of study')]

Code:

l = re.split('Overview of (\w+) study', text)[1:]
l = [l[i:i+2] for i in range(0, len(l), 2)]

# or just `final_list = l` if it doesn't matter
final_list = []

for i in [[(i, k) for k in re.findall('\(.+?\)', j)] for i, j in l]:
    final_list.extend(i)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM