简体   繁体   English

我应该如何使这些正则表达式捕获组更简洁?

[英]How should I make these regex capture groups more succinct?

I'm using python's re library to do this, but it's a basic regex question.我正在使用 python 的re库来做到这一点,但这是一个基本的正则表达式问题。

I am receiving a string of coordinate information in degrees-minutes-seconds format without spaces, and I'm parsing it out to discrete coordinate pairs for conversion.我收到一串没有空格的度-分-秒格式的坐标信息,我将其解析为离散坐标对进行转换。

The string is fed to me looking like this (fake coords for example):字符串看起来像这样(例如假坐标):

102030N0102030E203040N0203040E304050N0304050E405060N0405060E

I am catching it like this:我是这样抓的:

coordstr = '102030N0102030E203040N0203040E304050N0304050E405060N0405060E'

coords = re.match(
    re.compile(r"^(\d+[NS]{1}\d+[EW]{1})(\d+[NS]{1}\d+[EW]{1})(\d+[NS]{1}\d+[EW]{1})(\d+[NS]{1}\d+[EW]{1})"),
    coordstr)

for x in coords.groups():
    print(x)

which gives me这给了我

102030N0102030E
203040N0203040E
304050N0304050E
405060N0405060E

And allows me to address each coordinate pair as coords.group(1) , coords.group(2) and so on.并允许我将每个坐标对寻址为coords.group(1)coords.group(2)等。

So it works, but it feels like I'm being too verbose in the pattern.所以它有效,但感觉我在模式中过于冗长。 Is there a more succinct way to crawl the line with one of the capture groups, and add each matched group to .groups() as it's encountered?有没有更简洁的方法来抓取包含一个捕获组的行,并将每个匹配的组添加到.groups()中? I know I could do it with brute force string slicing but that seems like more trouble than it's worth.我知道我可以用蛮力字符串切片来做到这一点,但这似乎比它的价值更麻烦。

I've read this but it doesn't seem to address what I'm going after in this question.我读过这个,但它似乎没有解决我在这个问题中要追求的问题。

Because this is for an enterprise and these strings describe raster bounds, I will be validating the string before introducing the regex search and falling back to a gdal object if the string is not found (or corrupted).因为这是针对企业的,并且这些字符串描述了栅格边界,所以我将在引入正则表达式搜索之前验证该字符串,如果未找到(或损坏)该字符串,则回退到gdal对象。

Since you will pre-validate the strings you will process with regex, you need not use re.search / re.match with several groups with identical pattern, you can use re.findall to get all \\d+[NS]\\d+[EW] pattern matches from your strings:由于您将使用正则表达式预先验证将处理的字符串,因此您无需使用re.search / re.match与具有相同模式的多个组,您可以使用re.findall来获取所有\\d+[NS]\\d+[EW]模式匹配您的字符串:

import re
coordstr = '102030N0102030E203040N0203040E304050N0304050E405060N0405060E'
coords = re.findall(r'\d+[NS]\d+[EW]', coordstr)
for x in coords:
    print(x)

Output:输出:

102030N0102030E
203040N0203040E
304050N0304050E
405060N0405060E

See the Python demo .请参阅Python 演示

NOTE : the list of matches returned by re.findall will always be in the same order as they are in the source text, see this SO post .注意:re.findall 返回的匹配列表将始终与它们在源文本中的顺序相同,请参阅此 SO 帖子

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM