简体   繁体   中英

Extract substring from filename in Python?

I have a directory full of files that have date strings as part of the filenames:

file_type_1_20140722_foo.txt
file_type_two_20140723_bar.txt
filetypethree20140724qux.txt

I need to get these date strings from the filenames and save them in an array:

['20140722', '20140723', '20140724']

But they can appear at various places in the filename, so I can't just use substring notation and extract it directly. In the past, the way I've done something similar to this in Bash is like so:

date=$(echo $file | egrep -o '[[:digit:]]{8}' | head -n1)

But I can't use Bash for this because it sucks at math (I need to be able to add and subtract floating point numbers). I've tried glob.glob() and re.match() , but both return empty sets:

>>> dates = [file for file in sorted(os.listdir('.')) if re.match("[0-9]{8}", file)]
>>> print dates
>>> []

I know the problem is it's looking for complete file names that are eight digits long, but I have no idea how to make it look for substrings instead. Any ideas?

>>> import re
>>> import os
>>> [date for file in os.listdir('.') for date in re.findall("(\d{8})", file)]
['20140722', '20140723']

Note that if a filename has a 9-digit substring, then only the first 8 digits will be matched. If a filename contains a 16-digit substring, there will be 2 non-overlapping matches.

re.match matches from the beginning of the string. re.search matches the pattern anywhere. Or you can try this:

extract_dates = re.compile("[0-9]{8}").findall
dates = [dates[0] for dates in sorted(
    extract_dates(filename) for filename in os.listdir('.')) if dates]

Your regular expression looks good, but you should be using re.search instead of re.match so that it will search for that expression anywhere in the string:

import re
r = re.compile("[0-9]{8}")
m = r.search(filename)
if m:
    print m.group(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM