简体   繁体   中英

Extract part of string based on a template in Python

I'd like to use Python to read in a list of directories and store data in variables based on a template such as /home/user/Music/%artist%/[%year%] %album% .

An example would be:

artist, year, album = None, None, None

template = "/home/user/Music/%artist%/[%year%] %album%"
path = "/home/user/Music/3 Doors Down/[2002] Away From The Sun"

if text == "%artist%":
    artist = key

if text == "%year%":
    year = key

if text == "%album%":
    album = key

print(artist)
# 3 Doors Down

print(year)
# 2002

print(album)
# Away From The Sun

I can do the reverse easily enough with str.replace("%artist%", artist) but how can extract the data?

If your folder structure template is reliable the following should work without the need for regular expressions.

path = "/home/user/Music/3 Doors Down/[2002] Away From The Sun"

path_parts = path.split("/") # divide up the path into array by slashes

print(path_parts)  

artist = path_parts[4] # get element of array at index 4

year = path_parts[5][1:5] # get characters at index 1-5 for the element of array at index 5

album = path_parts[5][7:]

print(artist)
# 3 Doors Down

print(year)
# 2002
    
print(album)
# Away From The Sun
    
# to put the path back together again using an F-string (No need for str.replace)
reconstructed_path = f"/home/user/Music/{artist}/[{year}] {album}"
    
print(reconstructed_path)

output:

['', 'home', 'user', 'Music', '3 Doors Down', '[2002] Away From The Sun']
3 Doors Down
2002
Away From The Sun
/home/user/Music/3 Doors Down/[2002] Away From The Sun

The following works for me:

from difflib import SequenceMatcher

def extract(template, text):
    seq = SequenceMatcher(None, template, text, True)
    return [text[c:d] for tag, a, b, c, d in seq.get_opcodes() if tag == 'replace']

template = "home/user/Music/%/[%] %"
path = "home/user/Music/3 Doors Down/[2002] Away From The Sun"

artist, year, album = extract(template, path)

print(artist)
print(year)
print(album)

Output:

3 Doors Down
2002
Away From The Sun

Each template placeholder can be any single character as long as the character is not present in the value to be returned.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM