如何使用Python中的正则表达式搜索由反斜杠分割的数据

Question

I am trying to list a part of data divided by single backslash. 我试图列出一部分数据除以单反斜杠。 The part is only a six digit number. 该部分只是一个六位数字。 The reason why I need to quote backslashes is that I will use this code for more files, which might include other six (and more) digit numbers in the group of data. 我需要引用反斜杠的原因是我会将此代码用于更多文件，其中可能包括数据组中的其他六个（和更多）数字。

Here is an example of the code: 以下是代码示例：

>>> layer = arcpy.mapping.Layer("J:\abcd\blabla.lyr")
>>> print layer.dataSource
C:\Users\416938\AppData\Roaming\ESRI\Desktop10.0\ArcCatalog\...
>>> result = re.search (r'([a-z]{1}[0-9]{6})', text)
>>> result.group(0)
u'416938'

But I would like to include the backslashes like this (obviously this code wouldn't work): 但我想包含这样的反斜杠（显然这段代码不起作用）：

re.search (r'(\[0-9] {6}\)', text)

Any help is much appreciated. 任何帮助深表感谢。 Thanks. 谢谢。

Answer 1

你需要逃避反斜杠：

re.search (r'(\\[0-9] {6}\\)', text)

Answer 2

Here is the code you can use to extract 6-digit number that is a whole word : 以下是可用于提取整数字的6位数字的代码：

import re
p = re.compile(ur'\b[0-9]{6}\b')
test_str = ur"C:\\Users\\416938\\AppData\\Roaming\\ESRI\\Desktop10.0\\ArcCatalog"
match = re.search(p, test_str)
if match:
    print(match.group(0))

See IDEONE demo 请参阅IDEONE演示

Note that \\b - a word boundary - matches at the following positions: 请注意\\b - 单词边界 - 匹配以下位置：

Before the first character in the string, if the first character is a word character. 在字符串中的第一个字符之前，如果第一个字符是单词字符。

After the last character in the string, if the last character is a word character. 在字符串中的最后一个字符之后，如果最后一个字符是单词字符。

Between two characters in the string, where one is a word character and the other is not a word character. 在字符串中的两个字符之间，其中一个是单词字符，另一个不是单词字符。

If you want to match a 6-digit sequence inside \\...\\ you can use 如果你想匹配\\...\\中的6位数序列，你可以使用

(?<=\\)[0-9]{6}(?=\\)

Or if you want to match a 6-digit sequence not enclosed with other digits (eg between letters), use this regex: 或者，如果要匹配未用其他数字括起的6位数序列（例如字母之间），请使用此正则表达式：

(?<!\d)[0-9]{6}(?!\d)

It contains 2 look-arounds. 它包含2个环视。 (?<!\\d) makes sure there is no digit before the the 6-digit sequence and (?!\\d) makes sure there is no digit after it. (?<!\\d)确保在6位数序列之前没有数字，并且(?!\\d)确保后面没有数字。

Answer 3

If the windows path will always have the given structure C:\\Users\\[0-9]{6}\\... - here we go without complicated escaped regex syntax: 如果Windows路径将始终具有给定的结构C:\\Users\\[0-9]{6}\\... - 这里我们没有复杂的转义正则表达式语法：

>>> text = r"C:\Users\416938\AppData\Roaming\ESRI\Desktop10.0\ArcCatalog"
>>> match = text.split("\\")[2]  # split at \ and grad third element
'416938'
>>> if match.isdigit() and len(match) == 6:  # check for digit and length 6
...

如何使用Python中的正则表达式搜索由反斜杠分割的数据

问题描述

3 个解决方案

解决方案1
2 2015-09-17 08:13:07

解决方案2
1 已采纳 2015-09-17 10:34:25

解决方案3
0 2015-09-17 11:13:30

如何使用Python中的正则表达式搜索由反斜杠分割的数据

问题描述

3 个解决方案

解决方案1 2 2015-09-17 08:13:07

解决方案2 1 已采纳 2015-09-17 10:34:25

解决方案3 0 2015-09-17 11:13:30

解决方案1
2 2015-09-17 08:13:07

解决方案2
1 已采纳 2015-09-17 10:34:25

解决方案3
0 2015-09-17 11:13:30