在 Python 中使用正則表達式從文本中提取列表

Question

我希望從以下字符串中提取元組列表：

text='''Consumer Price Index:
        +0.2% in Sep 2020

        Unemployment Rate:
        +7.9% in Sep 2020

        Producer Price Index:
        +0.4% in Sep 2020

        Employment Cost Index:
        +0.5% in 2nd Qtr of 2020

        Productivity:
        +10.1% in 2nd Qtr of 2020

        Import Price Index:
        +0.3% in Sep 2020

        Export Price Index:
        +0.6% in Sep 2020'''

我在這個過程中使用了“import re”。

輸出應該類似於：[('Consumer Price Index', '+0.2%', 'Sep 2020'), ...]

我想使用產生上述輸出的 re.findall 函數，到目前為止我有這個：

re.findall(r"(:\Z)\s+(%\Z+)(\Ain )", text)

我在哪里識別':'之前的字符，然后是'%'之前的字符，然后是'in'之后的字符。

我真的只是不知道如何繼續。 任何幫助，將不勝感激。 謝謝！

Answer 1

您可以使用

re.findall(r'(\S.*):\n\s*(\+?\d[\d.]*%)\s+in\s+(.*)', text)
# => [('Consumer Price Index', '+0.2%', 'Sep 2020'), ('Unemployment Rate', '+7.9%', 'Sep 2020'), ('Producer Price Index', '+0.4%', 'Sep 2020'), ('Employment Cost Index', '+0.5%', '2nd Qtr of 2020'), ('Productivity', '+10.1%', '2nd Qtr of 2020'), ('Import Price Index', '+0.3%', 'Sep 2020'), ('Export Price Index', '+0.6%', 'Sep 2020')]

請參閱正則表達式演示和Python 演示。

細節

(\\S.*) - 第 1 組：非空白字符后跟盡可能多的除換行符以外的零個或多個字符
: - 一個冒號
\\n - 換行
\\s* - 0 個或多個空格
(\\+?\\d[\\d.]*%) - 第 2 組：可選+ 、一個數字、零個或多個數字/點和一個%
\\s+in\\s+ - in包圍1+空格
(.*) - 第 3 組：盡可能多的除換行符以外的零個或多個字符

Answer 2

正則表達式不是解決這個問題的好方法。 它變得難以閱讀和維護得非常快。 使用 python 字符串函數可以更簡潔：

list_of_lines = [
    line.strip()                 # remove trailing and leading whitespace
    for line in text.split("\n") # split up the text into lines
    if line                      # filter out the empty lines
]

list_of_lines現在是：

['Consumer Price Index:', '+0.2% in Sep 2020', 'Unemployment Rate:', '+7.9% in Sep 2020', 'Producer Price Index:', '+0.4% in Sep 2020', 'Employment Cost Index:', '+0.5% in 2nd Qtr of 2020', 'Productivity:', '+10.1% in 2nd Qtr of 2020', 'Import Price Index:', '+0.3% in Sep 2020', 'Export Price Index:', '+0.6% in Sep 2020']

現在我們要做的就是從這個列表的元素對構建元組。

def pairwise(iterable):
    "s -> (s0, s1), (s2, s3), (s4, s5), ..."
    a = iter(iterable)
    return zip(a, a)

（從這里）

現在我們可以得到我們想要的輸出：

print(pairwise(list_of_lines))

[('Consumer Price Index:', '+0.2% in Sep 2020'), ('Unemployment Rate:', '+7.9% in Sep 2020'), ('Producer Price Index:', '+0.4% in Sep 2020'), ('Employment Cost Index:', '+0.5% in 2nd Qtr of 2020'), ('Productivity:', '+10.1% in 2nd Qtr of 2020'), ('Import Price Index:', '+0.3% in Sep 2020'), ('Export Price Index:', '+0.6% in Sep 2020')]

在 Python 中使用正則表達式從文本中提取列表

問題描述

2 個解決方案

解決方案1
4 已采納 2020-11-01 14:09:42

解決方案2
1 2020-11-01 14:19:57

在 Python 中使用正則表達式從文本中提取列表

問題描述

2 個解決方案

解決方案1 4 已采納 2020-11-01 14:09:42

解決方案2 1 2020-11-01 14:19:57

解決方案1
4 已采納 2020-11-01 14:09:42

解決方案2
1 2020-11-01 14:19:57