简体   繁体   English

如何使用 python 在一个字符串中查找多个子字符串的跨度?

[英]How to find the span of multiple sub-string in one string using python?

I have a text like我有一个像

xxxx BP 160/110 12/6/2018 sitting left arm @xyz hospital xxxx HgbA1c 12% on 21/1/2019 xxxx xxxx BP 160/110坐在左边 arm @xyz 医院 xxxx HgbA1c HgbA1c 12% on 21/1/2019 xxxx

and another string和另一个字符串

bp 160/110 hgba1c 12% bp 160/110 hgba1c 12%

Now, how can I get the span of each finding as below现在,我怎样才能得到每个发现的跨度如下

[(5, 15), (62, 72)] [(5, 15), (62, 72)]

Note: The above mentioned patterns can vary a lot.注意:上述模式可能会有很大差异。 So I want to achieve some dynamic solution.所以我想实现一些动态的解决方案。

Thanks in Advance提前致谢

This function will find the minimum and maximum bounds that contain the substring (otherwise, it will return False)这个 function 将找到包含 substring 的最小和最大边界(否则,它将返回 False)

import collections

def find_substring_bounds(a, b):
    need = collections.Counter(b)
    missing = len(b)
    for end, char in enumerate(a, 1):
        if need[char] > 0:
            missing -= 1
        need[char] -= 1
        if missing == 0: # found all the characters
            start = 0
            while start < end and need[a[start]] < 0:
                need[a[start]] += 1
                start += 1
            need[a[start]] += 1
            return start, end
    return False

We then need to find the middle-left and middle-right bounds:然后我们需要找到中左和中右边界:

def mid_left_mid_right(a, b, left, right):
    mid_left = left
    for mid_left, (c1, c2) in enumerate(zip(a[left:], b)):
        if c1 != c2:
            break
    mid_right = right
    for mid_right, (c1, c2) in enumerate(zip(a[:right][::-1], b[::-1])):
        if c1 != c2:
            break
    return [(left, left+mid_left), (right-mid_right, right)]

Example:例子:

s1 = "xxxx BP 160/110 12/6/2018 sitting left arm @xyz hospital xxxx HgbA1c 12% on 21/1/2019 xxxx"
s2 = "bp 160/110 hgba1c 12%"
left_, right_ = find_substring_bounds(s1.lower(), s2)
res = mid_left_mid_right(s1.lower(), s2, left_, right_)
print(res)

Outputs:输出:

[(5, 16), (61, 72)]

You may need to amend this for any edge cases in your dataset.您可能需要针对数据集中的任何边缘情况对此进行修改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当我们有多个子字符串时,如何使用python替换字符串的特定子字符串? - How to replace specific sub-string of a string using python while we have multiple sub-string? 如何在Python的字符串中仅保留一个已定义的子字符串 - How to leave only one defined sub-string in a string in Python 如何在 pandas 中找到确切的子字符串? - How to find exact sub-string in pandas? 如何在 Python 中对列表进行子串 - How to sub-string a list in Python 在 Python 中的字符串列表中查找带有表达式的子字符串 - Find sub-string with expressions inside a list of string in Python 使用python re模块在两个字符(a * b)之间的字符串中查找子字符串的数量 - Find number of sub-string in a string between two character(a*b) using python re module 如何从字符串中查找子字符串的开始和结束索引 - How to find start & end index of a sub-string from a string 如何找到 python 字符串中第一次出现的子字符串? - How can I find the first occurrence of a sub-string in a python string? 你如何找到一个子字符串是否不在字符串中作为正则表达式? - how do you find if a sub-string is not in a string as a regex? 如何通过 Python 中的 Selenium 从相对于多个分隔符动态变化的字符串中检索子字符串 - How to retrieve a sub-string from a string that changes dynamically with respect to multiple delimiters through Selenium in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM