从python字符串中提取子字符串

Question

I want to extract the string before the 9 digit number below:我想在下面的 9 位数字之前提取字符串：

tmp = place1_128017000_gw_cl_mask.tif

The output should be place1输出应该是place1

I could do this: tmp.split('_')[0] but I also want the solution to work for:我可以这样做： tmp.split('_')[0]但我也希望该解决方案适用于：

tmp = place1_place2_128017000_gw_cl_mask.tif where the result would be: place1_place2 tmp = place1_place2_128017000_gw_cl_mask.tif结果将是： place1_place2

You can assume that the number will also be 9 digits long您可以假设该号码也将是 9 位数字

Answer 1

Assuming we can phrase your problem as wanting the substring up to, but not including the underscore which is followed by all numbers, we can try:假设我们可以将您的问题表述为希望子字符串达到，但不包括后跟所有数字的下划线，我们可以尝试：

tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'^([^_]+(?:_[^_]+)*)_\d+_', tmp)
print(m.group(1))  # place1_place2

Answer 2

Using regular expressions and the lookahead feature of regex, this is a simple solution:使用正则表达式和正则表达式的前瞻功能，这是一个简单的解决方案：

tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'.+(?=_\d{9}_)', tmp)
print(m.group())

Result:结果：

place1_place2

Note that the \d{9} bit matches exactly 9 digits.请注意， \d{9}位正好匹配 9 个数字。 And the bit of the regex that is in (?= ... ) is a lookahead, which means it is not part of the actual match, but it only matches if that follows the match.并且(?= ... )中的正则表达式位是前瞻，这意味着它不是实际匹配的一部分，但只有在匹配之后才匹配。

Answer 3

Use a regular expression:使用正则表达式：

import re

places = (
    "place1_128017000_gw_cl_mask.tif",
    "place1_place2_128017000_gw_cl_mask.tif",
)
pattern = re.compile("(place\d+(?:_place\d+)*)_\d{9}")
for p in places:
    matched = pattern.match(p)
    if matched:
        print(matched.group(1))

prints:印刷：

place1地点1

place1_place2地点1_地点2

The regex works like this (adjust as needed, eg, for less than 9 digits or a variable number of digits):正则表达式的工作方式如下（根据需要进行调整，例如，少于 9 位或可变位数）：

( starts a capture (开始捕获
place\d+ matches "places plus 1 to many digits" place\d+匹配“位置加 1 到多个数字”
(?: starts a group, but does not capture it (no need to capture) (?:启动一个组，但不捕获它（无需捕获）
_place\d+ matches more "places" _place\d+匹配更多“地点”
) closes the group )关闭组
* means zero or many times the previous group *表示前一组的零次或多次
) closes the capture )关闭捕获
\d{9} matches 9 digits \d{9}匹配 9 位数字

The result is in the first (and only) capture group.结果在第一个（也是唯一的）捕获组中。

Answer 4

Here's a possible solution without regex (unoptimized!):这是一个没有正则表达式的可能解决方案（未优化！）：

def extract(s):
    result = ''
    for x in s.split('_'):
        try: x = int(x)
        except: pass
        if isinstance(x, int) and len(str(x)) == 9:
            return result[:-1]
        else:
            result += x + '_'

tmp = 'place1_128017000_gw_cl_mask.tif'
tmp2 = 'place1_place2_128017000_gw_cl_mask.tif'

print(extract(tmp))   # place1
print(extract(tmp2))  # place1_place2

从python字符串中提取子字符串

问题描述

4 个解决方案

解决方案1
3 2022-07-13 03:47:00

解决方案2
3 已采纳 2022-07-13 03:51:58

解决方案3
1 2022-07-13 03:50:47

解决方案4
1 2022-07-13 04:45:24

从python字符串中提取子字符串

问题描述

4 个解决方案

解决方案1 3 2022-07-13 03:47:00

解决方案2 3 已采纳 2022-07-13 03:51:58

解决方案3 1 2022-07-13 03:50:47

解决方案4 1 2022-07-13 04:45:24

解决方案1
3 2022-07-13 03:47:00

解决方案2
3 已采纳 2022-07-13 03:51:58

解决方案3
1 2022-07-13 03:50:47

解决方案4
1 2022-07-13 04:45:24