簡體   English   中英

如何使正則表達式一行一行地匹配兩個字符串?

[英]how to make regex go line by line to match two strings at the same time?

這個問題措辭有點怪異,但我不知道該怎么問。

我正在使用wordnet提取一些定義,我需要讓正則表達式從輸出中提取詞性和定義,如下所示:如果我查找單詞study

Overview of verb study

1. reading, blah, blah (to read a book with the intent of learning)
2. blah blah blah (second definition of study)

Overview of noun study

1. blah blah blah (the object of ones study)
2. yadda yadda yadda (second definition of study)

我想讓這個退貨...

[('verb', 'to read a book with the intent of learning'), ('verb', 'second definition of study'), ('noun', 'the object of ones studying'), ('noun','second definition of study')]

我有兩個與我想要的正則表達式匹配的表達式,但是我無法弄清楚如何遍歷數據才能最終獲得所需的數據結構。 有任何想法嗎?

編輯:

添加正則表達式模式

stripped_defs = re.findall('^\s*\d+\..*\(([^)"]+)', definitions, re.M)
pos = re.findall('Overview of (\w+)', definitions)

我的方式是( text就是文字):

  1. 通過Overview of...將它們拆分:

     >>> re.split('Overview of (\\w+) study', text)[1:] ['verb', '\\n\\n1. reading, blah, blah (to read a book with the intent of learning)\\n2. blah blah blah (second definition of study)\\n\\n', 'noun', '\\n\\n1. blah blah blah (the object of ones study)\\n2. yadda yadda yadda (second definition of study)'] >>> l = re.split('Overview of (\\w+) study', text)[1:] 
  2. 像這樣拆分列表:

     >>> [l[i:i+2] for i in range(0, len(l), 2)] [['verb', '\\n\\n1. reading, blah, blah (to read a book with the intent of learning)\\n2. blah blah blah (second definition of study)\\n\\n'], ['noun', '\\n\\n1. blah blah blah (the object of ones study)\\n2. yadda yadda yadda (second definition of study)']] >>> l = [l[i:i+2] for i in range(0, len(l), 2)] 

然后我們可以簡單地做:

>>> [[(i, k) for k in re.findall('\((.+?)\)', j)] for i, j in l]
[[('verb', 'to read a book with the intent of learning'),
  ('verb', 'second definition of study')],

 [('noun', 'the object of ones study'),
  ('noun', 'second definition of study')]]

得到您期望的輸出:

final_list = []
for i in [[(i, k) for k in re.findall('\(.+?\)', j)] for i, j in l]:
    final_list.extend(i)

print(final_list)

這使:

[('verb', 'to read a book with the intent of learning'),
 ('verb', 'second definition of study'),

 ('noun', 'the object of ones study'),
 ('noun', 'second definition of study')]

碼:

l = re.split('Overview of (\w+) study', text)[1:]
l = [l[i:i+2] for i in range(0, len(l), 2)]

# or just `final_list = l` if it doesn't matter
final_list = []

for i in [[(i, k) for k in re.findall('\(.+?\)', j)] for i, j in l]:
    final_list.extend(i)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM