简体   繁体   English

Python正则表达式模式匹配

[英]Python Regex pattern Matching

I have a list in the following format: 我有以下格式的列表:

data =['| test_data_14865428_0              |', '| test_data_1486612450_0              |', '| test_template                  |', '|id_1475020800_0              |']

I want to fetch all the list elements of the format test_data_* into a new list (tables). 我想将格式为test_data_ *的所有列表元素提取到一个新列表(表)中。 The list tables should store the name in the format test_data_* 列表表应以test_data_ *格式存储名称

My try: 我的尝试:

import re
tables = []
pattern = re.compile("| test_data\S")

for i in range(0, len(data)):
    if pattern.match(data[i]):
        tables.append(data[i])

print(list_of_tables)

Since all of your data includes the substring test_data_ you could filter for that static phrase without requiring a regex: 由于所有数据都包含子字符串test_data_您可以过滤该静态短语,而无需使用正则表达式:

data = filter(lambda v: 'test_data_' in v, data)

If you then want to filter out the space and pipe separators you could use translate to remove the unwanted characters: 如果随后要过滤出空格和管道分隔符,则可以使用translate删除不需要的字符:

data = map(lambda v: v.translate(None, " |"), data)

Of course the expressions could be combined into a compound expression. 当然,这些表达可以组合成复合表达。


One problem with the regex in the original code above is that the | 上面原始代码中的正则表达式存在的一个问题是| needs to be escaped so it will be treated literally. 需要转义,以便将其按字面意义处理。 Currently it is treated as an alternate operator. 当前,它被视为备用运算符。

Though perhaps not the most elegant implementation, the following is one option: 尽管可能不是最优雅的实现,但以下是一种选择:

import re
pattern = re.compile("\| *(test_data_[\d_]+)")
def search(val):
    found = pattern.match(val)
    return found and found.group(1)
print(filter(lambda f: f, map(search, data)))

The filter with the identity map just removes records that had no match. 带有身份映射的过滤器仅删除不匹配的记录。

Use filter to select the values that contain "test_data_", then map a function across those values to clean up the strings. 使用filter选择包含“ test_data_”的值,然后在这些值之间映射函数以清理字符串。 No regex required. 无需正则表达式。

import operator

td = map(lambda s: s[2:].split(' ', 1)[0], 
         filter(operator.methodcaller('startswith', '| test_data_'),
                data))
print(list(td))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM