使用 Python 使用正则表达式查找重叠序列

Question

I'm trying to extract numbers and both previous and following characters (excluding digits and whitespaces) of a string.我正在尝试提取字符串的数字以及前后字符（不包括数字和空格）。 The expected return of the function is a list of tuples, with each tuple having the shape:该函数的预期返回是一个元组列表，每个元组具有以下形状：

(previous_sequence, number, next_sequence)

For example:例如：

string = '200gr T34S'
my_func(string)
>>[('', '200', 'gr'), ('T', '34', 'S')]

My first iteration was to use:我的第一次迭代是使用：

def my_func(string):
    res_obj = re.findall(r'([^\d\s]+)?(\d+)([^\d\s]+)?', string)

But this function doesn't do what I expect when I pass a string like '2AB3' I would like to output [('','2','AB'), ('AB','3','')] and instead, it is showing [('','2','AB'), ('','3','')] , because 'AB' is part of the previous output.但是当我传递像'2AB3'这样的字符串时，这个函数并没有像我期望的那样做我想输出[('','2','AB'), ('AB','3','')]而是显示[('','2','AB'), ('','3','')] ，因为 'AB' 是先前输出的一部分。

How could I fix this?我怎么能解决这个问题？

Answer 1

Instead of modifier + and ?而不是修饰符+和? you can simply use * :你可以简单地使用* ：

>>> re.findall(r'([^\d\s]*)(\d+)([^\d\s]*)',string)
[('', '200', 'gr'), ('T', '34', 'S')]

But if you mean to match the overlapped strings you can use a positive look ahead to fine all the overlapped matches :但是如果你想匹配重叠的字符串，你可以使用积极的前瞻性来细化所有重叠的匹配：

>>> re.findall(r'(?=([^\d\s]*)(\d+)([^\d\s]*))','2AB3')
[('', '2', 'AB'), ('AB', '3', ''), ('B', '3', ''), ('', '3', '')]

Answer 2

Since there is no overlapping numbers, a single trailing由于没有重叠的数字，单个尾随
assertion should be all you need.断言应该是你所需要的。

Something like ([^\\d\\s]+)?(\\d+)(?=([^\\d\\s]+)?)像([^\\d\\s]+)?(\\d+)(?=([^\\d\\s]+)?)

This ([^\\d\\s]*)(\\d+)(?=([^\\d\\s]*)) if you care about这个([^\\d\\s]*)(\\d+)(?=([^\\d\\s]*))如果你关心
the difference between NULL and the empty string. NULL 和空字符串之间的区别。

Answer 3

Another way can be using regex and functions!另一种方法是使用正则表达式和函数！

import re

#'200gr T34S'  '2AB3'


def s(x):
    tmp=[]
    d = re.split(r'\s+|(\d+)',x)
    d = ['' if v is None else v for v in d] #remove None
    t_ = [i for i in d if len(i)>0]
    digits = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
    nms = [i for i in t_ if i[0] in digits]
    for i in nms:       
        if d.index(i)==0:
            tmp.append(('',i,d[d.index(i)+1]))
        elif d.index(i)==len(d):
            tmp.append((d[d.index(i)-1],i,''))
        else:
            tmp.append((d[d.index(i)-1],i,d[d.index(i)+1]))
    return tmp

print s('2AB3')

Prints-印刷-

[('', '2', 'AB'), ('AB', '3', '')]

使用 Python 使用正则表达式查找重叠序列

问题描述

3 个解决方案

解决方案1
1 2015-11-05 18:57:58

解决方案2
1 已采纳

解决方案3
0 2015-11-05 20:21:49

使用 Python 使用正则表达式查找重叠序列

问题描述

3 个解决方案

解决方案1 1 2015-11-05 18:57:58

解决方案2 1 已采纳

解决方案3 0 2015-11-05 20:21:49

解决方案1
1 2015-11-05 18:57:58

解决方案2
1 已采纳

解决方案3
0 2015-11-05 20:21:49