简体   繁体   中英

Using re and extract some patterns from list Python2.7

I have such filenamelist from local directory.

['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']

I wanna extract the yyyymmdd directory name,not tar.gzfile. this being ideal result.

['20150301', '20150302', '20150303']

I tried this one.

import re
pattern = "^(?!.*tar.gz).*$"
file_list = ['20150301',
 '20150301100.tar.gz',
 '20150302',
 '20150302100.tar.gz',
 '20150303',
 '20150303100.tar.gz']
matchOB = re.match(pattern , file_list)

thanks for reading.

You can simply check for items that don't have '.tar.gz' in their name.

for fyle in ['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']:
    if '.tar.gz' not in fyle:
        print fyle

gives output:

20150301
20150302
20150303

To have the output as a list:

my_list = ['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']
print [x for x in my_list if '.tar.gz' not in x]

has output:

['20150301', '20150302', '20150303']

You can use this code with ^\\d+$ regex matching a whole string that is composed of digits only:

import re

file_list = ['20150301',
 '20150301100.tar.gz',
 '20150302',
 '20150302100.tar.gz',
 '20150303',
 '20150303100.tar.gz']
matchOB = [x for x in file_list if re.search(r"^\d+$", x)]
print(matchOB)

Sample online demo output:

['20150301', '20150302', '20150303']

The [x for x in file_list if re.search(r"^\\d+$", x)] list comprehension returns any element from the list that is only composed of 1+ digits.

If your date-like pattern always contains 8 digits, you may replace ^\\d+$ pattern with ^\\d{8}$ .

By String processing:

We can use isdigit() method of string and the len() function to validate string.

Demo:

>>> result = []
>>> input_dirs = ['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']
>>> for i in input_dirs:
...   if i.isdigit() and len(i)==8:
...     result.append(i)
... 
>>> print result
['20150301', '20150302', '20150303']

如果将最后一个语句替换为,则表达式可以工作

matchOB = [re.match(pattern, file).group() for file in file_list if re.match(pattern, file)]

Or something like that:

list = ['20150301', '20150301121501.tar.gz', '20150302', '20150302121501.tar.gz', '20150303', '20150303121501.tar.gz']
new_list=[]

for l in list:    
    if l.find(".")<0:       
        new_list.append(l)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM