在字符串列表中搜索模式-Python

Question

我有一个包含文件名的字符串列表，例如

file_names = ['filei.txt','filej.txt','filek.txt','file2i.txt','file2j.txt','file2k.txt','file3i.txt','file3j.txt','file3k.txt']

然后，我使用以下方式删除.txt扩展名：

extension = os.path.commonprefix([n[::-1] for n in file_names])[::-1]

file_names_strip = [n[:-len(extension)] for n in file_names]

然后返回列表file_names_strip中每个字符串的最后一个字符：

h = [n[-1:] for n in file_names_strip]

给出h = ['i', 'j', 'k', 'i', 'j', 'k', 'i', 'j', 'k']

我如何测试h的字符串模式？ 因此，如果i ， j ， k顺序出现，则返回True和False。 我需要知道这一点，因为并非所有文件名都像在file_names那样被格式化。

所以：

test_ijk_pattern(h) = True

no_pattern = ['1','2','3','1','2','3','1','2','3']

test_ijk_pattern(no_pattern) = False

Answer 1

这是我要如何攻击的方法：

def patternFinder(h):    #Takes a list and returns a list of the pattern if found, otherwise returns an empty list

    if h[0] in h[1:]:
        rptIndex = h[1:].index(h[0]) + 1 #Gets the index of the second instance of the first element in the list
    else:
        print "This list has no pattern"
        return []

    if len(h) % rptIndex != 0:
        h = h[:-(len(h) % rptIndex)]   #Takes off extra entries at the end which would break the next step

    subLists = [h[i:i+rptIndex] for i in range(0,len(h),rptIndex)]   #Divide h into sublists which should all have the same pattern

    hasPattern = True   #Assume the list has a pattern
    numReps = 0  #Number of times the pattern appears

    for subList in subLists:
        if subList != subLists[0]: 
            hasPattern = False
        else:
            numReps += 1

    if hasPattern and numReps != 1:
        pattern = subList[0]
        return pattern
    else:
        print "This list has no pattern"
        return []

假设这使得：

该模式显示在前几个字符中
最后的不完整模式并不重要（ [1,2,3,1,2,3,1,2]将带有[1,2,3] 2个实例）
h至少有2个条目
模式之间没有多余的字符

如果您对这些假设都满意，那么它将对您有用，希望这会有所帮助！

Answer 2

您可以使用正则表达式。

import re
def test_pattern(pattern, mylist):
  print pattern
  print mylist
  print "".join(mylist)
  if re.match(r'(%s)+$' % pattern, "".join(mylist)) != None: # if the pattern matchtes at least one time, nothing else is allowed
    return True
  return False       

print test_pattern("ijk", ["i", "j", "k", "i", "j", "k"])

您可以通过这种方式完成操作，而不会除去最后一个字母和文件结尾。 我更新了正则表达式，使其可以正常工作。 一个问题是我使用了变量名，并且它寻找模式“ mypattern”。 使用％s会将其替换为实际模式。 我希望此解决方案适合您。

myfiles = ["ai.txt", "aj.txt", "ak.txt", "bi.txt", "bj.txt", "bk.txt"]
mypattern = ["i", "j", "k"]

import re
# pattern as a list e.g. ["i", "j", "k"]
def test_pattern(pattern, filenames):
    mypattern = "["+"\.[a-zA-Z0-9]*".join(pattern) + "\.[a-zA-Z0-9]*]*"
    # this pattern matches any character, an "i", followed by a dot, any characters, followed by j., any characters, followd by k. (change it a bit if your file names contain numbers and/or uppercase)
    print mypattern
    print "".join(filenames)
    if re.search(r'%s' % mypattern, "".join(filenames)) != None: # if the pattern matchtes at least one time, nothing else is allowed
        return True
    return False



print test_pattern(mypattern, myfiles)

输出：

[i\.[a-zA-Z0-9]*j\.[a-zA-Z0-9]*k\.[a-zA-Z0-9]*]*
ai.txtaj.txtak.txtbi.txtbj.txtbk.txt
True

在字符串列表中搜索模式-Python

问题描述

2 个解决方案

解决方案1
1 2013-11-08 14:16:17

解决方案2
0 2013-11-08 14:02:20

在字符串列表中搜索模式-Python

问题描述

2 个解决方案

解决方案1 1 2013-11-08 14:16:17

解决方案2 0 2013-11-08 14:02:20

解决方案1
1 2013-11-08 14:16:17

解决方案2
0 2013-11-08 14:02:20