Using re and extract some patterns from list Python2.7

Question

I have such filenamelist from local directory.

['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']

I wanna extract the yyyymmdd directory name,not tar.gzfile. this being ideal result.

['20150301', '20150302', '20150303']

I tried this one.

import re
pattern = "^(?!.*tar.gz).*$"
file_list = ['20150301',
 '20150301100.tar.gz',
 '20150302',
 '20150302100.tar.gz',
 '20150303',
 '20150303100.tar.gz']
matchOB = re.match(pattern , file_list)

thanks for reading.

Answer 1

You can simply check for items that don't have '.tar.gz' in their name.

for fyle in ['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']:
    if '.tar.gz' not in fyle:
        print fyle

gives output:

20150301
20150302
20150303

To have the output as a list:

my_list = ['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']
print [x for x in my_list if '.tar.gz' not in x]

has output:

['20150301', '20150302', '20150303']

Answer 2

You can use this code with ^\\d+$ regex matching a whole string that is composed of digits only:

import re

file_list = ['20150301',
 '20150301100.tar.gz',
 '20150302',
 '20150302100.tar.gz',
 '20150303',
 '20150303100.tar.gz']
matchOB = [x for x in file_list if re.search(r"^\d+$", x)]
print(matchOB)

Sample online demo output:

['20150301', '20150302', '20150303']

The [x for x in file_list if re.search(r"^\\d+$", x)] list comprehension returns any element from the list that is only composed of 1+ digits.

If your date-like pattern always contains 8 digits, you may replace ^\\d+$ pattern with ^\\d{8}$ .

Answer 3

By String processing:

We can use isdigit() method of string and the len() function to validate string.

Demo:

>>> result = []
>>> input_dirs = ['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']
>>> for i in input_dirs:
...   if i.isdigit() and len(i)==8:
...     result.append(i)
... 
>>> print result
['20150301', '20150302', '20150303']

Answer 4

如果将最后一个语句替换为，则表达式可以工作

matchOB = [re.match(pattern, file).group() for file in file_list if re.match(pattern, file)]

Answer 5

Or something like that:

list = ['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']
new_list=[]

for l in list:    
    if l.find(".")<0:       
        new_list.append(l)

Using re and extract some patterns from list Python2.7

Question

5 answers

solution1
0 2015-04-07 08:32:56

solution2
0 ACCPTED 2015-04-07 08:35:11

solution3
0 2015-04-07 08:38:48

solution4
0 2015-04-07 08:38:54

solution5
0 2015-04-07 09:34:18

Using re and extract some patterns from list Python2.7

Question

5 answers

solution1 0 2015-04-07 08:32:56

solution2 0 ACCPTED 2015-04-07 08:35:11

solution3 0 2015-04-07 08:38:48

solution4 0 2015-04-07 08:38:54

solution5 0 2015-04-07 09:34:18

solution1
0 2015-04-07 08:32:56

solution2
0 ACCPTED 2015-04-07 08:35:11

solution3
0 2015-04-07 08:38:48

solution4
0 2015-04-07 08:38:54

solution5
0 2015-04-07 09:34:18