简体   繁体   English

Python正则表达式找到符号数字符号

[英]Python regex to find symbol digit symbol

I wrote this regex in Python and tested it out on regex101, but it is still not working the way I want: 我用Python编写了这个正则表达式并在regex101上测试了它,但它仍然没有按照我想要的方式工作:

((^[-\/\\\(\)\s\,\&\.]+)?([0-9]+)([-\/\\\(\)\s\,\&\.])+)

What I am trying to find is the pattern where the string optionally starts or ends with one of these symbols, and has ONLY digits in the middle: 我想要找到的是字符串可选地以这些符号之一开始或结束的模式,并且中间只有数字:

-/\()& .

This list includes dash, forward slash, back slash, parenthesis, ampersand, blank space, and period. 此列表包括短划线,正斜杠,反斜杠,括号,&符号,空格和句点。 A search should return true if the string contains ONLY digit is in the middle with optional punctuation at the beginning and/or end of the string. 如果字符串包含ONLY数字位于中间,并且在字符串的开头和/或结尾处有可选的标点符号,则搜索应返回true。

This regex seems to work for most cases, but fails if I add a letter into the digits in the middle. 这个正则表达式似乎适用于大多数情况,但如果我在中间的数字中添加一个字母则会失败。 It still ends up returning True. 它仍然最终返回True。 What should I do to this regex so that it only returns true for cases where there is symbol (optional), all digits, symbol (optional)? 我应该怎么做这个正则表达式,以便只有符号(可选),所有数字,符号(可选)的情况下才返回true?

Cases where it should return True: 它应该返回True的情况:

  1. symbol + digits ie (9672 符号+数字即(9672
  2. only digits ie 20427304 or 8 只有数字,即20427304或8
  3. digits + symbol ie 345-- 数字+符号即345--
  4. symbol + digits + symbol ie (67-. 符号+数字+符号即(67-。

Case where it should NOT return True (because of the 'y' in the string): 它不应该返回True的情况(因为字符串中的'y'):

(678983y733).. (678983y733)..

There are a few things in your regex that need to change. 你的正则表达式中有一些东西需要改变。

  • First of all, you have WAY too much escaping going on there, which makes it super confusing to read. 首先,你有太多的逃避在那里,这使得阅读超级混乱。

  • Secondly, You have weird stuff happening with the parenthesis. 其次,你用括号发生了奇怪的事情。 You don't need anything to completely surround the regex, because $0 already will return that. 你不需要任何东西来完全包围正则表达式,因为$0已经将返回。

  • Your last char class is not optional in your regex. 您的最后一个char类在正则表达式中不是可选的。

  • You need to surround everything with ^$ in order to ensure that the string isn't a partial match. 你需要用^$包围所有内容,以确保字符串不是部分匹配。

Here's what I came up with: 这是我想出的:

^([-/\\()\s,&.]*)([0-9]+)([-/\\()\s,&.]*)$

Note that having ([something]+)? 注意有([something]+)? is equal to ([something]*) , but the latter is more readable. 等于([something]*) ,但后者更具可读性。

I think what you are looking for is re.fullmatch . 我认为你要找的是re.fullmatch

import re
ponct = '[' + re.escape('-/\()&') + ']*'
p = re.compile(ponct + '[0-9]+' + ponct)

Then p.fullmatch('(678983y733)') will return None, and all your other examples will return a match. 然后p.fullmatch('(678983y733)')将返回None,所有其他示例将返回匹配。

This allows you to find them embedded in a string, not just at the start. 这允许您在字符串中找到它们,而不仅仅是在开头。 The ? 的? allows zero or one symbol. 允许零个或一个符号。 Change this to * if you want zero or more leading/trailing symbols. 如果您想要零个或多个前导/尾随符号,请将其更改为*。

([-\\\/\&\.]?)\b([0-9]+)\b([-\\\/\&\.]?)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM