简体   繁体   English

解析字符串中特定子字符串的最pythonic方法?

[英]The most pythonic way to parse specific sub-string in a string?

I have the following log and want to extract the second "DDD-xxxxx" ID from each entry (if exist a second DDD id):我有以下日志并想从每个条目中提取第二个“DDD-xxxxx”ID(如果存在第二个 DDD id):

cs:444 - br:/main/j_DDD-50535/DDD-68009
cs:445 - br:/main/j_DDD-50535/j_DDD-70220
cs:446 - br:/main/j_DDD-50535/j_DDD-70117
cs:447-Merge from branch: /main/j_DDD-50544/j_DDD-61183
Requested by: Smith, John (UserID1)
cs:448-Merge from branch: /main/j_DDD-4822
Requested by: Grant, Huge (userID2)
cs:449-Daily automated release of 3.5.5.4

Using regex I found a workaround to get them but I think it should be possible to get much easier:使用正则表达式我找到了一种解决方法来获取它们,但我认为应该可以变得更容易:

def read_log():
    log_file_name = "log"
    with open(log_file_name, "r") as file:
        log_file = file.read().split("cs:")
    return log_file

def key_creator():
    log_data = read_log()

    keys = []
    for line in log_data:
        # print(line)
        if line[:5].isdigit():
            search = re.search('/j_(.*)\n', line)
            if hasattr(search, "group"):
                search = search.group(1).split('/j_')

                if 1 < len(search) and search[1][:3] == "DDD":
                    keys.append(search[1])
                    print(line)
    return keys

key_creator()

Edit: Just to clarify: - the string DDD can be followed by indeterminate number of digits, (DDD-23, DDD-342, DDD-4842, DDD-44332... would be possibles entries as well)编辑:澄清一下: - 字符串 DDD 后面可以跟不确定的位数,(DDD-23、DDD-342、DDD-4842、DDD-44332...也可能是条目)

def key_creator():
    log_data = read_log()
    keys = []
    for line in log_data:
        s = re.findall(r'(DDD-\d+)', line)
        if s and len(s)>1:
            keys.append(s[1])
    return keys

You can use a proper regex pattern to match your request:您可以使用适当的正则表达式模式来匹配您的请求:

def key_creator():
    log_data = read_log()

    keys = []
    for line in log_data:
        # print(line)
        search = re.search('/j_(DDD_\d{5})\n', line)
        if search is not None:
             keys.append(search.group(1))
             print(line)
    return keys

The pattern requires the string DDD followed by an underscore and exactly 5 digits.该模式需要字符串DDD后跟一个下划线和 5 位数字。 The return value is non if the string is not found, and otherwise it returns two groups: one with the whole match (group(0)) and one with only the content of the parenthesis (group(1)), which is already what you are looking for.如果没有找到字符串,则返回值为非,否则返回两组:一组是整个匹配(组(0)),另一组只有括号的内容(组(1)),这已经是什么了你正在寻找。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM