帶有正則表達式的 Python 分區字符串

Question

我正在嘗試使用 Python 的分區和正則表達式清理文本字符串。 例如：

testString = 'Tre Bröders Väg 6 2tr'
sep = '[0-9]tr'
head,sep,tail = testString.partition(sep)
head
>>>'Tre Br\xc3\xb6ders V\xc3\xa4g 6 2tr'

頭部仍然包含我要刪除的2tr 。 我不太擅長正則表達式，但 [0-9] 不應該這樣做嗎？

我期望從這個例子中得到的輸出是

head
>>> 'Tre Br\xc3\xb6ders V\xc3\xa4g 6

Answer 1

str.partition不支持正則表達式，因此當你給它一個像'[0-9]tr'這樣的字符串時，它試圖在testString找到基於的精確字符串，它不使用任何正則表達式。

根據str.partition文件 -

在第一次出現sep時拆分字符串，並返回包含分隔符之前的部分的3元組，分隔符本身以及分隔符之后的部分。 如果找不到分隔符，則返回包含字符串本身的3元組，后跟兩個空字符串。

既然你說，你只需要head ，你可以使用re模塊中的re.split()方法，將maxsplit設置為1 ，然后獲取它的第一個元素，它應該與你在str.partition中嘗試的str.partition 。 示例 -

import re
testString = 'Tre Bröders Väg 6 2tr'
sep = '[0-9]tr'
head = re.split(sep,testString,1)[0]

演示 -

>>> import re
>>> testString = 'Tre Bröders Väg 6 2tr'
>>> sep = '[0-9]tr'
>>> head = re.split(sep,testString,1)[0]
>>> head
'Tre Bröders Väg 6 '

Answer 2

普通的re.split()方法

您可以使用re.split()提取head 。

import re

testString = 'Tre Bröders Väg 6 2tr'
sep = r'[0-9]tr'  # "r" is essential here!
head, tail = re.split(sep, testString)
head.strip()
>>>'Tre Bröders Väg 6'

巧克力灑re.split()方法

如果你用()捕獲sep ， re.split()行為就像一個偽re.partition() （在 Python 中沒有這樣的方法，實際上......）

import re

testString = 'Tre Bröders Väg 6 2tr'
sep = r'([0-9]tr)'  # "()" added.
head, sep, tail = re.split(sep, testString)
head, sep, tail
>>>('Tre Bröders Väg 6 ', '2tr', '')

Answer 3

對於那些仍在尋找如何進行正則表達式分區的答案的人，請嘗試以下函數：

import regex # re also works

def regex_partition(content, separator):
    separator_match = regex.search(separator, content)
    if not separator_match:
        return content, '', ''

    matched_separator = separator_match.group(0)
    parts = regex.split(matched_separator, content, 1)

    return parts[0], matched_separator, parts[1]

Answer 4

我來到這里是為了尋找一種使用基於正則表達式的partition()

包含在yelichi answer 中，如果re.split()包含捕獲組，則可以返回分隔符，因此基於正則表達式創建分區函數的最基本方法是：

re.split( "(%s)" % sep, testString, 1)

但是，這只適用於簡單的正則表達式。 如果您通過使用組的正則表達式進行拆分（即使未捕獲），它也不會提供預期的結果。

我首先查看了在skia.heliou answer 中提供的函數，但它不必要地第二次運行正則表達式，更重要的是，如果模式與自身不匹配，則會失敗（它應該在matched_separator 上使用string.split，而不是re.split） .

因此，我實現了自己的支持正則表達式的 partition() 版本：

def re_partition(pattern, string, return_match=False):
    '''Function akin to partition() but supporting a regex
    :param pattern: regex used to partition the content
    :param content: string being partitioned
    '''

    match = re.search(pattern, string)

    if not match:
        return string, '', ''

    return string[:match.start()], match if return_match else match.group(0), string[match.end():]

作為附加功能，這可以返回匹配對象本身，而不僅僅是匹配的字符串。 這允許您直接與分隔符的組進行交互。

並以迭代器形式：

def re_partition_iter(pattern, string, return_match=False):
    '''Returns an iterator of re_partition() output'''

    pos = 0
    pattern = re.compile(pattern)
    while True:
        match = pattern.search(string, pos)
        if not match:
            if pos < len(string):  # remove this line if you prefer to receive an empty string
                yield string[pos:]
            break

        yield string[pos:match.start()]
        yield match if return_match else match.group(0)
        pos = match.end()

帶有正則表達式的 Python 分區字符串

問題描述

4 個解決方案

解決方案1
2 已采納 2015-09-26 11:13:24

解決方案2
1 2021-08-09 13:06:51

解決方案3
0 2018-07-17 15:03:40

解決方案4
0 2021-11-11 03:42:30

帶有正則表達式的 Python 分區字符串

問題描述

4 個解決方案

解決方案1 2 已采納 2015-09-26 11:13:24

解決方案2 1 2021-08-09 13:06:51

解決方案3 0 2018-07-17 15:03:40

解決方案4 0 2021-11-11 03:42:30

解決方案1
2 已采納 2015-09-26 11:13:24

解決方案2
1 2021-08-09 13:06:51

解決方案3
0 2018-07-17 15:03:40

解決方案4
0 2021-11-11 03:42:30