使用正則表達式查找模式

Question

這是我的測試示例

JT  - American journal of public health
JID - 1254074
SB  - AIM
SB  - IM
MH  - Adult
MH  - Biomedical Research/*organization & administration
MH  - Female
MH  - Health Care Reform/*history/*methods
AB  - OBJECTIVES: We assessed whether a 2-phase labeling and choice 
AB-  architecture intervention
OWN - NLM
STAT- MEDLINE
DCOM- 20120417
LR  - 20170220
IS  - 1541-0048 (Electronic)

在所有行均以“ MH”開頭之后，如何編寫正則表達式以僅識別詞匯表，然后將其導入到Excel工作表中。 輸出應該是這樣的：

[Adult, Biomedical Research, organization & administration, Female, Health Care Reform, history, methods].

這是我的嘗試：

import re
Path = "MH\s*.*" 
re.findall(Path,file)

我知道這是錯誤的，但我不知道如何解決。

謝謝

Answer 1

使用re.findall

演示：

import re
s = """JT  - American journal of public health
JID - 1254074
SB  - AIM
SB  - IM
MH  - Adult
MH  - Biomedical Research/*organization & administration
MH  - Female
MH  - Health Care Reform/*history/*methods
AB  - OBJECTIVES: We assessed whether a 2-phase labeling and choice 
AB-  architecture intervention
OWN - NLM
STAT- MEDLINE
DCOM- 20120417
LR  - 20170220
IS  - 1541-0048 (Electronic)"""

res = []
for i in re.findall(r"MH\s+-\s+(.*)", s, flags=re.MULTILINE):
    res.extend(i.split("/*"))
print( res )

輸出：

['Adult', 'Biomedical Research', 'organization & administration', 'Female', 'Health Care Reform', 'history', 'methods']

Answer 2

看起來您需要做一些正則表達式，因為您還想在/ *上拆分某些行。 這應該可以解決問題！

import re

my_file = """JT  - American journal of public health
JID - 1254074
SB  - AIM
SB  - IM
MH  - Adult
MH  - Biomedical Research/*organization & administration
MH  - Female
MH  - Health Care Reform/*history/*methods
AB  - OBJECTIVES: We assessed whether a 2-phase labeling and choice
AB-  architecture intervention
OWN - NLM
STAT- MEDLINE
DCOM- 20120417
LR  - 20170220
IS  - 1541-0048 (Electronic)"""

my_list = my_file.splitlines()

new_list = []

for item in my_list:
    if re.search("^MH\s*-", item):
        item = re.sub("[^-]+-\s*", "", item)
        item = item.split("/*")
        new_list = new_list + item

print(new_list)

輸出：

['Adult', 'Biomedical Research', 'organization & administration', 'Female', 'Health Care Reform', 'history', 'methods']

我正在使用該字符串並將其放入列表中。 我認為很有可能在導入該字符串時將其作為列表。 我還喜歡每次使用正則表達式一次處理1行，只是以后更容易進行故障排除。

我要匹配以MH開頭的項目，然后捕獲它們。 然后，我在/*上拆分每個項目，並將所有這些項目放到一個不錯的列表中，可用於excel導出。

Answer 3

只是發布我嘗試過的代碼，然后才注意到在編寫更好的答案之前已經發布了代碼。
請不要判斷。 那只是發生在SO。

s = """
JT  - American journal of public health
JID - 1254074
MH  - Adult
MH  - Biomedical Research/*organization & administration
MH  - Health Care Reform/*history/*methods
AB  - OBJECTIVES: We assessed whether a 2-phase labeling and choice
"""

import re
import itertools
matches = re.findall(r"^MH[\s-]+(.*)$", s, re.MULTILINE)
splitmatches = [i.split(r"/*") for i in matches]
flattenedmatches = list(itertools.chain(*splitmatches))

print(flattenedmatches)

輸出：

['Adult', 'Biomedical Research', 'organization & administration', 'Health Care Reform', 'history', 'methods']

使用正則表達式查找模式

問題描述

3 個解決方案

解決方案1
2 2018-06-05 14:57:54

解決方案2
2 已采納 2018-06-05 15:01:41

解決方案3
1 2018-06-05 15:47:52

使用正則表達式查找模式

問題描述

3 個解決方案

解決方案1 2 2018-06-05 14:57:54

解決方案2 2 已采納 2018-06-05 15:01:41

解決方案3 1 2018-06-05 15:47:52

解決方案1
2 2018-06-05 14:57:54

解決方案2
2 已采納 2018-06-05 15:01:41

解決方案3
1 2018-06-05 15:47:52