在 Python 中使用正則表達式的文本后提取字符串

Question

我有一個 doc 文件，它具有以下結構：

This is a fairy tale written by

    John Doe and Mary Smith
    
    Auckland,somewhere
    
 This story is awesome

我想提取兩行文本，它們是：

        John Doe and Mary Smith
        
        Auckland,somewhere

和 append 使用正則表達式將這些值放入列表中。 我想提取的兩行字總是在字里行間This is a fairy tale ， This story is awesome了。 我怎樣才能做到這一點？ 我嘗試了一些與before_keyword,keyword,after_keyword=text.partition(regex)的組合，但一點運氣都沒有。

Answer 1

您可以使用帶有re.DOTALL的正則表達式來啟用. 匹配任何字符，包括換行符。 一旦在兩個分隔符之間有了文本，就可以使用另一個不帶re.DOTALL的正則表達式來提取至少包含一個非空白字符 ( \S ) 的行。

import re

lst = []

with open('input.txt') as f:
    text = f.read()

match = re.search('This is a fairy tale written by(.*?)This story is awesome', 
                  text, re.DOTALL)

if match:
    lst.extend(re.findall('.*\S.*', match.group(1)))

print(lst)

給出：

['    John Doe and Mary Smith', '    Auckland,somewhere']

Answer 2

你可以從這個開始：

re.search(r'(?<=This is a fairy tale written by\n).*?(?=\n\s*This story is awesome)', s, re.MULTILINE|re.DOTALL).group(0)

並微調這個正則表達式。 re.MULTILINE可以省略，因為您沒有^或$反正，但re.DOTALL需要 let . 也匹配換行符。 上面的正則表達式使用向前看和向后看(?<=) , (?=) 。 如果您不喜歡這樣，您可以使用括號代替捕獲。

Answer 3

如果您可以從 docfile 創建字符串列表，則無需使用正則表達式。 只需執行這個簡單的程序：

fileContent = ['This is a fairy tale written by','John Doe and Mary Smith','Auckland,somewhere','This story is awesome',
               'Some other things', 'story texts', 'Not Important data',
               'This is a fairy tale written by','Kem Cho?','Majama?','This story is awesome', 'Not important data']
               
authorsList = []
for i in range(len(fileContent)-3):
    if fileContent[i] == 'This is a fairy tale written by' and fileContent[i+3] == 'This story is awesome':
        authorsList.append([fileContent[i+1], fileContent[i+2]])

print(authorsList)

在這里，我只需檢查'This is a fairy tale written by'和'This story is awesome' ，如果找到，則在您的列表中顯示 append 文本。

Output：

[['John Doe and Mary Smith', 'Auckland,somewhere'], ['Kem Cho?', 'Majama?']]

Answer 4

嘗試改用它。 它應該匹配這兩個字符串之間的任何內容。

re.search(r'(?<=This is a fairy tale).*?(?=This story is awesome)',text)

在 Python 中使用正則表達式的文本后提取字符串

問題描述

4 個解決方案

解決方案1
1 已采納 2020-08-21 04:20:27

解決方案2
0 2020-08-21 04:18:47

解決方案3
0 2020-08-21 04:31:45

解決方案4
0 2020-08-21 05:13:30

在 Python 中使用正則表達式的文本后提取字符串

問題描述

4 個解決方案

解決方案1 1 已采納 2020-08-21 04:20:27

解決方案2 0 2020-08-21 04:18:47

解決方案3 0 2020-08-21 04:31:45

解決方案4 0 2020-08-21 05:13:30

解決方案1
1 已采納 2020-08-21 04:20:27

解決方案2
0 2020-08-21 04:18:47

解決方案3
0 2020-08-21 04:31:45

解決方案4
0 2020-08-21 05:13:30