简体   繁体   English

如何使正则表达式一行一行地匹配两个字符串?

[英]how to make regex go line by line to match two strings at the same time?

The question is worded a bit weird, but I didn't know how else to ask it. 这个问题措辞有点怪异,但我不知道该怎么问。

I am using wordnet to pull some definitions and I need to have regex both pull the part of speech and the definition from the output which goes like this... if I looked up the word study 我正在使用wordnet提取一些定义,我需要让正则表达式从输出中提取词性和定义,如下所示:如果我查找单词study

Overview of verb study

1. reading, blah, blah (to read a book with the intent of learning)
2. blah blah blah (second definition of study)

Overview of noun study

1. blah blah blah (the object of ones study)
2. yadda yadda yadda (second definition of study)

I want to get this returned... 我想让这个退货...

[('verb', 'to read a book with the intent of learning'), ('verb', 'second definition of study'), ('noun', 'the object of ones studying'), ('noun','second definition of study')]

I have the two regex expressions that match what I want, but I can't figure out how to go through the data in order to get the data structure I want in the end. 我有两个与我想要的正则表达式匹配的表达式,但是我无法弄清楚如何遍历数据才能最终获得所需的数据结构。 Any ideas? 有任何想法吗?

EDIT: 编辑:

adding regex patterns 添加正则表达式模式

stripped_defs = re.findall('^\s*\d+\..*\(([^)"]+)', definitions, re.M)
pos = re.findall('Overview of (\w+)', definitions)

My way is ( text is the text): 我的方式是( text就是文字):

  1. split them by the Overview of... : 通过Overview of...将它们拆分:

     >>> re.split('Overview of (\\w+) study', text)[1:] ['verb', '\\n\\n1. reading, blah, blah (to read a book with the intent of learning)\\n2. blah blah blah (second definition of study)\\n\\n', 'noun', '\\n\\n1. blah blah blah (the object of ones study)\\n2. yadda yadda yadda (second definition of study)'] >>> l = re.split('Overview of (\\w+) study', text)[1:] 
  2. split that list like this: 像这样拆分列表:

     >>> [l[i:i+2] for i in range(0, len(l), 2)] [['verb', '\\n\\n1. reading, blah, blah (to read a book with the intent of learning)\\n2. blah blah blah (second definition of study)\\n\\n'], ['noun', '\\n\\n1. blah blah blah (the object of ones study)\\n2. yadda yadda yadda (second definition of study)']] >>> l = [l[i:i+2] for i in range(0, len(l), 2)] 

Then we can simply do: 然后我们可以简单地做:

>>> [[(i, k) for k in re.findall('\((.+?)\)', j)] for i, j in l]
[[('verb', 'to read a book with the intent of learning'),
  ('verb', 'second definition of study')],

 [('noun', 'the object of ones study'),
  ('noun', 'second definition of study')]]

To get your expect output: 得到您期望的输出:

final_list = []
for i in [[(i, k) for k in re.findall('\(.+?\)', j)] for i, j in l]:
    final_list.extend(i)

print(final_list)

Which gives: 这使:

[('verb', 'to read a book with the intent of learning'),
 ('verb', 'second definition of study'),

 ('noun', 'the object of ones study'),
 ('noun', 'second definition of study')]

Code: 码:

l = re.split('Overview of (\w+) study', text)[1:]
l = [l[i:i+2] for i in range(0, len(l), 2)]

# or just `final_list = l` if it doesn't matter
final_list = []

for i in [[(i, k) for k in re.findall('\(.+?\)', j)] for i, j in l]:
    final_list.extend(i)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM