如何使用 python 在一个字符串中查找多个子字符串的跨度？

Question

I have a text like我有一个像

xxxx BP 160/110 12/6/2018 sitting left arm @xyz hospital xxxx HgbA1c 12% on 21/1/2019 xxxx xxxx BP 160/110坐在左边 arm @xyz 医院 xxxx HgbA1c HgbA1c 12% on 21/1/2019 xxxx

and another string和另一个字符串

bp 160/110 hgba1c 12% bp 160/110 hgba1c 12%

Now, how can I get the span of each finding as below现在，我怎样才能得到每个发现的跨度如下

[(5, 15), (62, 72)] [(5, 15), (62, 72)]

Note: The above mentioned patterns can vary a lot.注意：上述模式可能会有很大差异。 So I want to achieve some dynamic solution.所以我想实现一些动态的解决方案。

Thanks in Advance提前致谢

Answer 1

This function will find the minimum and maximum bounds that contain the substring (otherwise, it will return False)这个 function 将找到包含 substring 的最小和最大边界（否则，它将返回 False）

import collections

def find_substring_bounds(a, b):
    need = collections.Counter(b)
    missing = len(b)
    for end, char in enumerate(a, 1):
        if need[char] > 0:
            missing -= 1
        need[char] -= 1
        if missing == 0: # found all the characters
            start = 0
            while start < end and need[a[start]] < 0:
                need[a[start]] += 1
                start += 1
            need[a[start]] += 1
            return start, end
    return False

We then need to find the middle-left and middle-right bounds:然后我们需要找到中左和中右边界：

def mid_left_mid_right(a, b, left, right):
    mid_left = left
    for mid_left, (c1, c2) in enumerate(zip(a[left:], b)):
        if c1 != c2:
            break
    mid_right = right
    for mid_right, (c1, c2) in enumerate(zip(a[:right][::-1], b[::-1])):
        if c1 != c2:
            break
    return [(left, left+mid_left), (right-mid_right, right)]

Example:例子：

s1 = "xxxx BP 160/110 12/6/2018 sitting left arm @xyz hospital xxxx HgbA1c 12% on 21/1/2019 xxxx"
s2 = "bp 160/110 hgba1c 12%"
left_, right_ = find_substring_bounds(s1.lower(), s2)
res = mid_left_mid_right(s1.lower(), s2, left_, right_)
print(res)

Outputs:输出：

[(5, 16), (61, 72)]

You may need to amend this for any edge cases in your dataset.您可能需要针对数据集中的任何边缘情况对此进行修改。

如何使用 python 在一个字符串中查找多个子字符串的跨度？

问题描述

1 个解决方案

解决方案1
0 2022-09-05 20:19:25

如何使用 python 在一个字符串中查找多个子字符串的跨度？

问题描述

1 个解决方案

解决方案1 0 2022-09-05 20:19:25

解决方案1
0 2022-09-05 20:19:25