[英]The most pythonic way to parse specific sub-string in a string?
I have the following log and want to extract the second "DDD-xxxxx" ID from each entry (if exist a second DDD id):我有以下日志并想从每个条目中提取第二个“DDD-xxxxx”ID(如果存在第二个 DDD id):
cs:444 - br:/main/j_DDD-50535/DDD-68009
cs:445 - br:/main/j_DDD-50535/j_DDD-70220
cs:446 - br:/main/j_DDD-50535/j_DDD-70117
cs:447-Merge from branch: /main/j_DDD-50544/j_DDD-61183
Requested by: Smith, John (UserID1)
cs:448-Merge from branch: /main/j_DDD-4822
Requested by: Grant, Huge (userID2)
cs:449-Daily automated release of 3.5.5.4
Using regex I found a workaround to get them but I think it should be possible to get much easier:使用正则表达式我找到了一种解决方法来获取它们,但我认为应该可以变得更容易:
def read_log():
log_file_name = "log"
with open(log_file_name, "r") as file:
log_file = file.read().split("cs:")
return log_file
def key_creator():
log_data = read_log()
keys = []
for line in log_data:
# print(line)
if line[:5].isdigit():
search = re.search('/j_(.*)\n', line)
if hasattr(search, "group"):
search = search.group(1).split('/j_')
if 1 < len(search) and search[1][:3] == "DDD":
keys.append(search[1])
print(line)
return keys
key_creator()
Edit: Just to clarify: - the string DDD can be followed by indeterminate number of digits, (DDD-23, DDD-342, DDD-4842, DDD-44332... would be possibles entries as well)编辑:澄清一下: - 字符串 DDD 后面可以跟不确定的位数,(DDD-23、DDD-342、DDD-4842、DDD-44332...也可能是条目)
def key_creator():
log_data = read_log()
keys = []
for line in log_data:
s = re.findall(r'(DDD-\d+)', line)
if s and len(s)>1:
keys.append(s[1])
return keys
You can use a proper regex pattern to match your request:您可以使用适当的正则表达式模式来匹配您的请求:
def key_creator():
log_data = read_log()
keys = []
for line in log_data:
# print(line)
search = re.search('/j_(DDD_\d{5})\n', line)
if search is not None:
keys.append(search.group(1))
print(line)
return keys
The pattern requires the string DDD
followed by an underscore and exactly 5 digits.该模式需要字符串DDD
后跟一个下划线和 5 位数字。 The return value is non if the string is not found, and otherwise it returns two groups: one with the whole match (group(0)) and one with only the content of the parenthesis (group(1)), which is already what you are looking for.如果没有找到字符串,则返回值为非,否则返回两组:一组是整个匹配(组(0)),另一组只有括号的内容(组(1)),这已经是什么了你正在寻找。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.