[英]How to find the span of multiple sub-string in one string using python?
I have a text like我有一个像
xxxx BP 160/110
12/6/2018 sitting left arm @xyz hospital xxxx HgbA1c 12%
on 21/1/2019 xxxx xxxx
BP 160/110
坐在左边 arm @xyz 医院 xxxx HgbA1c HgbA1c 12%
on 21/1/2019 xxxx
and another string和另一个字符串
bp 160/110 hgba1c 12% bp 160/110 hgba1c 12%
Now, how can I get the span of each finding as below现在,我怎样才能得到每个发现的跨度如下
[(5, 15), (62, 72)] [(5, 15), (62, 72)]
Note: The above mentioned patterns can vary a lot.注意:上述模式可能会有很大差异。 So I want to achieve some dynamic solution.
所以我想实现一些动态的解决方案。
Thanks in Advance提前致谢
This function will find the minimum and maximum bounds that contain the substring (otherwise, it will return False)这个 function 将找到包含 substring 的最小和最大边界(否则,它将返回 False)
import collections
def find_substring_bounds(a, b):
need = collections.Counter(b)
missing = len(b)
for end, char in enumerate(a, 1):
if need[char] > 0:
missing -= 1
need[char] -= 1
if missing == 0: # found all the characters
start = 0
while start < end and need[a[start]] < 0:
need[a[start]] += 1
start += 1
need[a[start]] += 1
return start, end
return False
We then need to find the middle-left and middle-right bounds:然后我们需要找到中左和中右边界:
def mid_left_mid_right(a, b, left, right):
mid_left = left
for mid_left, (c1, c2) in enumerate(zip(a[left:], b)):
if c1 != c2:
break
mid_right = right
for mid_right, (c1, c2) in enumerate(zip(a[:right][::-1], b[::-1])):
if c1 != c2:
break
return [(left, left+mid_left), (right-mid_right, right)]
Example:例子:
s1 = "xxxx BP 160/110 12/6/2018 sitting left arm @xyz hospital xxxx HgbA1c 12% on 21/1/2019 xxxx"
s2 = "bp 160/110 hgba1c 12%"
left_, right_ = find_substring_bounds(s1.lower(), s2)
res = mid_left_mid_right(s1.lower(), s2, left_, right_)
print(res)
Outputs:输出:
[(5, 16), (61, 72)]
You may need to amend this for any edge cases in your dataset.您可能需要针对数据集中的任何边缘情况对此进行修改。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.