什么正則表達式將匹配逗號分隔的數字對，用管道分隔的數字對？

Question

我目前正在嘗試對看起來像這樣的輸入進行 RegEx 匹配（在 Python 中）：

37.1000,-88.1000
37.1000,-88.1000|37.1450,-88.1060
37.1000,-88.1000|37.1450,-88.1060|35.1450,-83.1060

所以，十進制數對，用逗號分隔，然后是那些對（如果 > 1 對）用|分隔 . 我已經嘗試了一些東西，但似乎無法獲得正確匹配的正則表達式字符串。

嘗試1：

((((\d*\.?\d+,\d*\.?\d+)\|)+)|(\d*\.?\d+,\d*\.?\d+))

嘗試2：

((((-?\d*\.?\d+,-?\d*\.?\d+)\|)+)|(-?\d*\.?\d+,-?\d*\.?\d+))

我希望有人可能以前做過這件事，或者有足夠的 RegEx 經驗來做這樣的事情。

Answer 1

如果要匹配整個字符串，可以匹配小數點並重復以逗號開頭的模式。

然后使用相同的模式並重復前面的|

^[+-]?\d+\.\d+(?:,[+-]?\d+\.\d+)*(?:\|[+-]?\d+\.\d+(?:,[+-]?\d+\.\d+)*)*$

^字符串開頭
[+-]?\d+\.\d+匹配可選的+或-和小數部分
(?:非捕獲組
- ,[+-]?\d+\.\d+匹配與前面相同的模式，前面加逗號
)*關閉組並重復 0+ 次
(?:非捕獲組
- \| 比賽|
- [+-]?\d+\.\d+匹配可選的+或-和小數部分
- (?:非捕獲組
  - ,[+-]?\d+\.\d+匹配與前面相同的模式，前面加逗號
- )*關閉組並重復 0+ 次
)*關閉組並重復 0+ 次
$字符串結尾

正則表達式演示

Answer 2

這就是解析器的用途（即檢查正確的格式）：

from parsimonious.grammar import Grammar

data = """
37.1000,-88.1000
37.1000,-88.1000|37.1450,-88.1060
37.1000,-88.1000|37.1450,-88.1060|35.1450,-83.1060
"""

grammar = Grammar(
    r"""
    line    = pair (pipe pair)*
    pair    = point ws? comma ws? point
    point   = ~"-?\d+(?:.\d+)?"
    comma   = ","
    pipe    = "|"
    ws      = ~"\s+"
    """
)


for line in data.split("\n"):
    try:
        grammar.parse(line)
        print("Correct format: {}".format(line))
    except:
        print("Not correct: {}".format(line))

這將產生

Not correct: 
Correct format: 37.1000,-88.1000
Correct format: 37.1000,-88.1000|37.1450,-88.1060
Correct format: 37.1000,-88.1000|37.1450,-88.1060|35.1450,-83.1060
Not correct:

Bot Not correct:語句來自空行。

如果您真的想檢索這些值，則需要編寫另一個Visitor class：

 class Points(NodeVisitor): grammar = Grammar( r""" line = pair (pipe pair)* pair = point ws? comma ws? point point = ~"-?\d+(?:.\d+)?" comma = "," pipe = "|" ws = ~"\s+" """ ) def generic_visit(self, node, visited_children): return visited_children or node def visit_pair(self, node, visited_children): x, *_, y = visited_children return (x.text, y.text) def visit_line(self, node, visited_children): pairs = [visited_children[0]] for potential_pair in [item[1] for item in visited_children[1]]: pairs.append(potential_pair) return pairs point = Points() for line in data.split("\n"): try: pairs = point.parse(line) print(pairs) except ParseError: print("Not correct: {}".format(line))

Answer 3

你甚至不需要正則表達式。 把事情簡單化。

步驟1

拆分, 。

s.split(',')

第2步

拆分| 並確保每個結果都是float類型（相反，它可以毫無錯誤地轉換為這種類型）。 如果不需要，可以刪除此處的第二步（驗證）。

r = s.split('|')
for v in r:
    try:
        float(v)
    except ValueError:
        print(v + ' is not a float')

第 3 步

結合。

在這里測試

strings = [
    '37.1000,-88.1000',
    '37.1000,-88.1000|37.1450,-88.1060',
    '37.1000,-88.1000|37.1450,-88.1060|35.1450,-83.1060'
]

def split_on_comma(s):
    return s.split(',')

def split_on_bar(s):
    r = s.split('|')
    for v in r:
        try:
            float(v)
        except ValueError:
            print(v + ' is not a float')
    return r

for s in strings:
    for c in split_on_comma(s):
        print(split_on_bar(c))

如果沒有驗證和函數，您的代碼將變為：

for s in strings:
    for c in s.split(','):
        for b in c.split('|'):
            print(b)

您可以根據自己的喜好更改 output，但這提供了拆分和驗證數據所需的每個步驟。

Answer 4

如果您想成對檢索值，並且使用簡單的正則表達式或只使用split()

for value in values:
    pairs = re.findall("([\d. ,-]+)\|?", value)
    for pair in pairs:
        v1, v2 = pair.strip().split(",")
# or
for value in values:
    pairs = value.split("|")
    for pair in pairs:
        v1, v2 = pair.strip().split(",")

什么正則表達式將匹配逗號分隔的數字對，用管道分隔的數字對？

問題描述

4 個解決方案

解決方案1
3 已采納 2019-11-04 17:55:27

解決方案2
3 2019-11-04 18:01:30

解決方案3
2 2019-11-04 18:22:23

步驟1

第2步

第 3 步

解決方案4
1 2019-11-04 17:57:48

什么正則表達式將匹配逗號分隔的數字對，用管道分隔的數字對？

問題描述

4 個解決方案

解決方案1 3 已采納 2019-11-04 17:55:27

解決方案2 3 2019-11-04 18:01:30

解決方案3 2 2019-11-04 18:22:23

步驟1

第2步

第 3 步

解決方案4 1 2019-11-04 17:57:48

解決方案1
3 已采納 2019-11-04 17:55:27

解決方案2
3 2019-11-04 18:01:30

解決方案3
2 2019-11-04 18:22:23

解決方案4
1 2019-11-04 17:57:48