简体   繁体   中英

How to search for data divided by backslash using regular expressions in Python

I am trying to list a part of data divided by single backslash. The part is only a six digit number. The reason why I need to quote backslashes is that I will use this code for more files, which might include other six (and more) digit numbers in the group of data.

Here is an example of the code:

>>> layer = arcpy.mapping.Layer("J:\abcd\blabla.lyr")
>>> print layer.dataSource
C:\Users\416938\AppData\Roaming\ESRI\Desktop10.0\ArcCatalog\...
>>> result = re.search (r'([a-z]{1}[0-9]{6})', text)
>>> result.group(0)
u'416938'

But I would like to include the backslashes like this (obviously this code wouldn't work):

re.search (r'(\[0-9] {6}\)', text)

Any help is much appreciated. Thanks.

你需要逃避反斜杠:

re.search (r'(\\[0-9] {6}\\)', text)

Here is the code you can use to extract 6-digit number that is a whole word :

import re
p = re.compile(ur'\b[0-9]{6}\b')
test_str = ur"C:\\Users\\416938\\AppData\\Roaming\\ESRI\\Desktop10.0\\ArcCatalog"
match = re.search(p, test_str)
if match:
    print(match.group(0))

See IDEONE demo

Note that \\b - a word boundary - matches at the following positions:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

If you want to match a 6-digit sequence inside \\...\\ you can use

(?<=\\)[0-9]{6}(?=\\)

Or if you want to match a 6-digit sequence not enclosed with other digits (eg between letters), use this regex:

(?<!\d)[0-9]{6}(?!\d)

It contains 2 look-arounds. (?<!\\d) makes sure there is no digit before the the 6-digit sequence and (?!\\d) makes sure there is no digit after it.

If the windows path will always have the given structure C:\\Users\\[0-9]{6}\\... - here we go without complicated escaped regex syntax:

>>> text = r"C:\Users\416938\AppData\Roaming\ESRI\Desktop10.0\ArcCatalog"
>>> match = text.split("\\")[2]  # split at \ and grad third element
'416938'
>>> if match.isdigit() and len(match) == 6:  # check for digit and length 6
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM