[英]Regex- Exclude middle pattern
I'm struggling with excluding or ignoring a certain pattern.我正在努力排除或忽略某种模式。
In excel there are many timestamps followed by an ID在 excel 中有许多时间戳后跟一个 ID
ie IE
[0:02:25] 10652A
sometimes there is a mistake where it is mixed up like this有时会出现这样混淆的错误
1 [0:03:23] 0652A
Here the 1
belongs to 0652A
, so it should be 10652A.....
这里的
1
属于0652A
,所以应该是10652A.....
How can I complete my code so that these mistakes (the middle timestamp part) can be ignored to match the id correctly?如何完成我的代码,以便可以忽略这些错误(中间时间戳部分)以正确匹配 id?
This is what I've got so far:这是我到目前为止所得到的:
starting_digits = re.search(r"^(\d+)", prefix)
id_code = re.search(r"(\d{2,4}.{1,3}):", prefix).group(1)
Thank you in advance !先感谢您 !
Here is the solution for remove all text between square brackets.这是删除方括号之间的所有文本的解决方案。
\[.*\]
\[.*\]
Use:利用:
# prefix = "1 [0:03:23] 0652A"
mobj = re.search(r"(\w+)?\s*\[(.*)\]\s*(\w+)", prefix)
id_code = mobj.group(1) + mobj.group(3) if mobj.group(1) else mobj.group(3)
timestamp = mobj.group(2)
print(id_code, timestamp)
This prints:这打印:
10652A 0:02:25
Instead of finding the content of the ID, you can simply erase the timestamp part which matches \s*\[[\d:]+\]\s*
您可以简单地擦除与
\s*\[[\d:]+\]\s*
匹配的时间戳部分,而不是查找 ID 的内容
digits/:
digits/:
reg = r"\s*\[.*\]\s*"
prefix = "[0:03:23] 0652A"
print(re.sub(reg, "", prefix)) # 0652A
prefix = "1 [0:03:23] 0652A"
print(re.sub(reg, "", prefix)) # 10652A
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.