简体   繁体   English

正则表达式 - 排除中间模式

[英]Regex- Exclude middle pattern

I'm struggling with excluding or ignoring a certain pattern.我正在努力排除或忽略某种模式。

In excel there are many timestamps followed by an ID在 excel 中有许多时间戳后跟一个 ID

ie IE

[0:02:25] 10652A

sometimes there is a mistake where it is mixed up like this有时会出现这样混淆的错误

1 [0:03:23] 0652A

Here the 1 belongs to 0652A , so it should be 10652A.....这里的1属于0652A ,所以应该是10652A.....

How can I complete my code so that these mistakes (the middle timestamp part) can be ignored to match the id correctly?如何完成我的代码,以便可以忽略这些错误(中间时间戳部分)以正确匹配 id?

This is what I've got so far:这是我到目前为止所得到的:

starting_digits = re.search(r"^(\d+)", prefix)
id_code = re.search(r"(\d{2,4}.{1,3}):", prefix).group(1)

Thank you in advance !先感谢您 !

Here is the solution for remove all text between square brackets.这是删除方括号之间的所有文本的解决方案。

\[.*\] \[.*\]

Use:利用:

# prefix = "1 [0:03:23] 0652A"
mobj = re.search(r"(\w+)?\s*\[(.*)\]\s*(\w+)", prefix)

id_code = mobj.group(1) + mobj.group(3) if mobj.group(1) else mobj.group(3)
timestamp = mobj.group(2)

print(id_code, timestamp)

This prints:这打印:

10652A 0:02:25

You can test the regular expression here .您可以在此处测试正则表达式。

Instead of finding the content of the ID, you can simply erase the timestamp part which matches \s*\[[\d:]+\]\s*您可以简单地擦除与\s*\[[\d:]+\]\s*匹配的时间戳部分,而不是查找 ID 的内容

  • any amount of space任何数量的空间
  • left square bracket左方括号
  • more than one digits/:多于一位digits/:
  • right square bracket右方括号
  • any amount of space任何数量的空间
reg = r"\s*\[.*\]\s*"

prefix = "[0:03:23] 0652A"
print(re.sub(reg, "", prefix))  # 0652A

prefix = "1 [0:03:23] 0652A"
print(re.sub(reg, "", prefix))  # 10652A

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM