How to find the span of multiple sub-string in one string using python?

Question

I have a text like

xxxx BP 160/110 12/6/2018 sitting left arm @xyz hospital xxxx HgbA1c 12% on 21/1/2019 xxxx

and another string

bp 160/110 hgba1c 12%

Now, how can I get the span of each finding as below

[(5, 15), (62, 72)]

Note: The above mentioned patterns can vary a lot. So I want to achieve some dynamic solution.

Thanks in Advance

Answer 1

This function will find the minimum and maximum bounds that contain the substring (otherwise, it will return False)

import collections

def find_substring_bounds(a, b):
    need = collections.Counter(b)
    missing = len(b)
    for end, char in enumerate(a, 1):
        if need[char] > 0:
            missing -= 1
        need[char] -= 1
        if missing == 0: # found all the characters
            start = 0
            while start < end and need[a[start]] < 0:
                need[a[start]] += 1
                start += 1
            need[a[start]] += 1
            return start, end
    return False

We then need to find the middle-left and middle-right bounds:

def mid_left_mid_right(a, b, left, right):
    mid_left = left
    for mid_left, (c1, c2) in enumerate(zip(a[left:], b)):
        if c1 != c2:
            break
    mid_right = right
    for mid_right, (c1, c2) in enumerate(zip(a[:right][::-1], b[::-1])):
        if c1 != c2:
            break
    return [(left, left+mid_left), (right-mid_right, right)]

Example:

s1 = "xxxx BP 160/110 12/6/2018 sitting left arm @xyz hospital xxxx HgbA1c 12% on 21/1/2019 xxxx"
s2 = "bp 160/110 hgba1c 12%"
left_, right_ = find_substring_bounds(s1.lower(), s2)
res = mid_left_mid_right(s1.lower(), s2, left_, right_)
print(res)

Outputs:

[(5, 16), (61, 72)]

You may need to amend this for any edge cases in your dataset.

How to find the span of multiple sub-string in one string using python?

Question

1 answers

solution1
0 2022-09-05 20:19:25

How to find the span of multiple sub-string in one string using python?

Question

1 answers

solution1 0 2022-09-05 20:19:25

solution1
0 2022-09-05 20:19:25