[英]Extract substring from a python string
I want to extract the string before the 9 digit number below:我想在下面的 9 位数字之前提取字符串:
tmp = place1_128017000_gw_cl_mask.tif
The output should be place1
输出应该是
place1
I could do this: tmp.split('_')[0]
but I also want the solution to work for:我可以这样做:
tmp.split('_')[0]
但我也希望该解决方案适用于:
tmp = place1_place2_128017000_gw_cl_mask.tif
where the result would be: place1_place2
tmp = place1_place2_128017000_gw_cl_mask.tif
结果将是: place1_place2
You can assume that the number will also be 9 digits long您可以假设该号码也将是 9 位数字
Assuming we can phrase your problem as wanting the substring up to, but not including the underscore which is followed by all numbers, we can try:假设我们可以将您的问题表述为希望子字符串达到,但不包括后跟所有数字的下划线,我们可以尝试:
tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'^([^_]+(?:_[^_]+)*)_\d+_', tmp)
print(m.group(1)) # place1_place2
Using regular expressions and the lookahead feature of regex, this is a simple solution:使用正则表达式和正则表达式的前瞻功能,这是一个简单的解决方案:
tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'.+(?=_\d{9}_)', tmp)
print(m.group())
Result:结果:
place1_place2
Note that the \d{9}
bit matches exactly 9 digits.请注意,
\d{9}
位正好匹配 9 个数字。 And the bit of the regex that is in (?= ... )
is a lookahead, which means it is not part of the actual match, but it only matches if that follows the match.并且
(?= ... )
中的正则表达式位是前瞻,这意味着它不是实际匹配的一部分,但只有在匹配之后才匹配。
Use a regular expression:使用正则表达式:
import re
places = (
"place1_128017000_gw_cl_mask.tif",
"place1_place2_128017000_gw_cl_mask.tif",
)
pattern = re.compile("(place\d+(?:_place\d+)*)_\d{9}")
for p in places:
matched = pattern.match(p)
if matched:
print(matched.group(1))
prints:印刷:
place1
地点1
place1_place2
地点1_地点2
The regex works like this (adjust as needed, eg, for less than 9 digits or a variable number of digits):正则表达式的工作方式如下(根据需要进行调整,例如,少于 9 位或可变位数):
(
starts a capture (
开始捕获place\d+
matches "places plus 1 to many digits" place\d+
匹配“位置加 1 到多个数字”(?:
starts a group, but does not capture it (no need to capture) (?:
启动一个组,但不捕获它(无需捕获)_place\d+
matches more "places" _place\d+
匹配更多“地点”)
closes the group )
关闭组*
means zero or many times the previous group *
表示前一组的零次或多次)
closes the capture )
关闭捕获\d{9}
matches 9 digits \d{9}
匹配 9 位数字The result is in the first (and only) capture group.结果在第一个(也是唯一的)捕获组中。
Here's a possible solution without regex (unoptimized!):这是一个没有正则表达式的可能解决方案(未优化!):
def extract(s):
result = ''
for x in s.split('_'):
try: x = int(x)
except: pass
if isinstance(x, int) and len(str(x)) == 9:
return result[:-1]
else:
result += x + '_'
tmp = 'place1_128017000_gw_cl_mask.tif'
tmp2 = 'place1_place2_128017000_gw_cl_mask.tif'
print(extract(tmp)) # place1
print(extract(tmp2)) # place1_place2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.