简体   繁体   English

如何从字符串的末尾提取 substring?

[英]How to extract the substring from the end on the string?

I am trying to extract the timestamp from the end of my file name.我正在尝试从我的文件名末尾提取时间戳。

(Example file name: ABC_xyz_march_2020.xlsx or xyz_mno__20_07_2019.xlsx , xyz_spa_20-07-2019.xlsx , xyz-mar_2019.csv , ABC-dec-5.csv , etc) (output: march_2020 , 20_07_2019 , 20-07-2019 , mar_2019 , dec-5 , etc) (Example file name: ABC_xyz_march_2020.xlsx or xyz_mno__20_07_2019.xlsx , xyz_spa_20-07-2019.xlsx , xyz-mar_2019.csv , ABC-dec-5.csv , etc) (output: march_2020 , 20_07_2019 , 20-07-2019 , mar_2019 , dec-5等)

I am using the split() function of string but I do not extract the month in words.我正在使用字符串的split() function 但我没有用文字提取月份。

Can anyone suggest a different approach?任何人都可以提出不同的方法吗?

I don't often reach for regular expressions, but sometimes that's the right tool for the job:我不经常使用正则表达式,但有时这是工作的正确工具:

import re

PATTERNS = [
    re.compile(pattern)
    for pattern in (
        r"([A-Za-z]+_\d{4})\..*",  # {month_name}_{year}
        r"(\d{2}_\d{2}_\d{4})\..*",  # {day}_{month}_{year}
        r"(\d{2}-\d{2}-\d{4})\..*",  # {day}-{month}-{year}
        r"([A-Za-z]+-\d{1,2})\..*",  # {month_name}-day
    )
]

inputs = [
    "ABC_xyz_march_2020.xlsx",
    "xyz_mno__20_07_2019.xlsx",
    "xyz_spa_20-07-2019.xlsx",
    "xyz-mar_2019.csv",
    "ABC-dec-5.csv",
]

outputs = [
    "march_2020",
    "20_07_2019",
    "20-07-2019",
    "mar_2019",
    "dec-5",
]

for filename, expected_output in zip(inputs, outputs):
    for pattern in PATTERNS:
        match = pattern.search(filename)
        if not match:
            continue
        matched_date = match.group(1)
        if matched_date != expected_output:
            raise ValueError(
                f"In date {filename=}, got {matched_date=} instead of {expected_output=}"
            )
        print(f"Looked at {filename=} and found {matched_date=}")
        break

This builds a list of regular expression objects.这将构建一个正则表达式对象列表。 Then for each input filename, it tries to match the filename against each regexp until one matches.然后对于每个输入文件名,它会尝试将文件名与每个正则表达式匹配,直到匹配。 Error handling (or deciding what to do if no patterns match) is left to the reader.错误处理(或如果没有匹配的模式决定做什么)留给读者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM