I have a list of strings containing filenames such as,
file_names = ['filei.txt','filej.txt','filek.txt','file2i.txt','file2j.txt','file2k.txt','file3i.txt','file3j.txt','file3k.txt']
I then remove the .txt
extension using:
extension = os.path.commonprefix([n[::-1] for n in file_names])[::-1]
file_names_strip = [n[:-len(extension)] for n in file_names]
And then return the last character of each string in the list file_names_strip
:
h = [n[-1:] for n in file_names_strip]
Which gives h = ['i', 'j', 'k', 'i', 'j', 'k', 'i', 'j', 'k']
How can i test for a pattern of strings in h
? So if i
, j
, k
occur sequentially it would return True and False if not. I need to know this because not all file names are formatted like they are in file_names
.
So:
test_ijk_pattern(h) = True
no_pattern = ['1','2','3','1','2','3','1','2','3']
test_ijk_pattern(no_pattern) = False
Here's how I would attack this:
def patternFinder(h): #Takes a list and returns a list of the pattern if found, otherwise returns an empty list
if h[0] in h[1:]:
rptIndex = h[1:].index(h[0]) + 1 #Gets the index of the second instance of the first element in the list
else:
print "This list has no pattern"
return []
if len(h) % rptIndex != 0:
h = h[:-(len(h) % rptIndex)] #Takes off extra entries at the end which would break the next step
subLists = [h[i:i+rptIndex] for i in range(0,len(h),rptIndex)] #Divide h into sublists which should all have the same pattern
hasPattern = True #Assume the list has a pattern
numReps = 0 #Number of times the pattern appears
for subList in subLists:
if subList != subLists[0]:
hasPattern = False
else:
numReps += 1
if hasPattern and numReps != 1:
pattern = subList[0]
return pattern
else:
print "This list has no pattern"
return []
Assumptions that this makes:
[1,2,3,1,2,3,1,2]
will come up with having 2 instances of [1,2,3]
) h
has at least 2 entries If you're fine with these assumptions, then this will work for you, hope this helps!
You could use regex.
import re
def test_pattern(pattern, mylist):
print pattern
print mylist
print "".join(mylist)
if re.match(r'(%s)+$' % pattern, "".join(mylist)) != None: # if the pattern matchtes at least one time, nothing else is allowed
return True
return False
print test_pattern("ijk", ["i", "j", "k", "i", "j", "k"])
You could do it this way without stripping the last letters and the file endings. I updated the regular expression so that it works. One problem was that I used the variable name and it looked for the pattern "mypattern". Using %s replaces it with the real pattern. I hope this solution suits you.
myfiles = ["ai.txt", "aj.txt", "ak.txt", "bi.txt", "bj.txt", "bk.txt"]
mypattern = ["i", "j", "k"]
import re
# pattern as a list e.g. ["i", "j", "k"]
def test_pattern(pattern, filenames):
mypattern = "["+"\.[a-zA-Z0-9]*".join(pattern) + "\.[a-zA-Z0-9]*]*"
# this pattern matches any character, an "i", followed by a dot, any characters, followed by j., any characters, followd by k. (change it a bit if your file names contain numbers and/or uppercase)
print mypattern
print "".join(filenames)
if re.search(r'%s' % mypattern, "".join(filenames)) != None: # if the pattern matchtes at least one time, nothing else is allowed
return True
return False
print test_pattern(mypattern, myfiles)
Output:
[i\.[a-zA-Z0-9]*j\.[a-zA-Z0-9]*k\.[a-zA-Z0-9]*]*
ai.txtaj.txtak.txtbi.txtbj.txtbk.txt
True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.