簡體   English   中英

在Python中使用正則表達式確定幾何類型

[英]Using Regular Expression in Python to determine the type of geometry

我有許多這樣的WKT(眾所周知的文本)形式的MULTIPOLYGONS:

A single multipolygon with no holes:
MULTIPOLYGON(((11 -17, -8 -1, 14 -8, 18 -17, 3 -11, 0 18, -17 -12, -17 -10, -10 -13)))

A single multipolygon with one hole:
MULTIPOLYGON(((14 7, -12 18, -13 -12, -7 19, 0 -16, 11 16, 19 18),(5 -11, -12 1, 0 -4, 1 -1)))

Several multipolygons with no holes:
MULTIPOLYGON(((14 11, -14 -16, 16 18, 6 -8, -14 6, -20 -5, 12 -1, 2 -19, -15 10, 7 2)),((-16 -15, -16 -15, 19 -2, -2 -13, 3 -19, -16 14, -1 -20)),((0 2, -18 8, -20 17, 12 5, 13 17, -9 -8, -20 -8, 20 -6, -12 0, -9 -4, -5 -14, -16 -19)),((-15 -16, -14 2, 19 -18, 4 8, 18 -1, -2 -13)),((-10 9, -12 15, -16 20, -15 -13, -17 16, -11 3, 18 -13, -3 13, -6 1, 2 12)))

Several multipolygons where at least one has a hole:
MULTIPOLYGON(((-8 0, 10 -7, 15 0, 0 14, -11 -13, -15 -5),(-6 -19, -18 -8, -9 -18, -1 2, 10 -8, -1 -12, -9 -16)),((0 19, -8 -10, -12 -12, 15 -20, 9 9, 16 5),(8 4, 8 0, 2 7, -17 8, 13 17, -6 -7)),((-9 -1, 19 9, -15 -11, -4 -14, -3 18),(-4 -15, 7 8, 5 -6, 20 13, 0 7, -10 -18)),((19 0, -13 1, -10 -12, -8 7, -1 14, 17 11),(0 -10, -1 -20, -14 7)),((-20 -7, 3 -7, 15 2, -7 -7),(9 -18, 13 -2, -15 -8, -2 -9)),((-18 8, 4 15, -1 -12, -13 18, 8 -17, -14 -19, -7 -13, 1 2, -11 -15, 5 20, 12 -14, 4 -10, -17 8, -6 15, -18 15)),((15 -2, 14 2, 17 -2, 6 3, -16 2),(7 0, 17 10, 17 -17, 13 -3, 1 -8)))

我需要一個程序來確定給定幾何形狀的這些類型。 我試圖在python中使用正則表達式。 由於幾何之間的唯一區別是括號,因此我嘗試將它們用作圖案,但是它不起作用。 我試圖識別一個有一個孔的單個多面體是這樣的:

str1=poly_single_no_hole
regex=r'),('
p = re.compile(regex, re.IGNORECASE)
m1 = p.match(str1)
print 'str 1 (' + str1 + ') contains is a single multipolygon with no hole:', m1 is not None

為什么它不能使用括號作為模式?有什么可能的解決方案?

您是否考慮使用解析器而不是正則表達式? (解析器可以創建實際的MultiPolygon和Polygon對象,並且這些對象可以具有您需要從文本中提取的任何內容的字段。)例如,如下所示:

class Multi_Polygon:
  def __init__(self):
    self.polygons = []

class Polygon:
  def __init__(self):
    self.num_loops = 0

class Token:
  def __init__(self, ch, line, column):
    self.ch = ch
    self.line = line
    self.column = column

class Parser:
  def __init__(self, text):
    self.tokens = []
    line = 1
    column = 0
    for ch in text:
      column += 1
      if ch in {'(', ')', ','}:
        self.tokens.append(Token(ch, line, column))
      elif ch == '\n':
        line += 1
        column = 1
    self.multi_polygons = []
    self.token_index = 0
    self.multi = None
    self.polygon = None
    self.parse()

  def eat(self, ch):
    token = self.tokens[self.token_index]
    if ch != token.ch:
      message = "line %d, column %d, expected '%s', but found '%s'" % (token.line, token.column, ch, token.ch)
      raise Exception(message)
    self.token_index += 1

  def left_paren(self):
    self.eat('(')

  def right_paren(self):
    self.eat(')')

  def ate_comma(self):
    ch = self.tokens[self.token_index].ch
    if (ch != ','):
      return False
    self.token_index += 1
    return True

  def parse(self):
    while self.token_index < len(self.tokens):
      self.parse_multi_polygon()

  def parse_multi_polygon(self):
    self.multi = Multi_Polygon()
    self.left_paren()
    while True:
      self.parse_polygon()
      if not self.ate_comma():
        break
    self.right_paren()
    self.multi_polygons.append(self.multi)
    self.multi = None

  def parse_polygon(self):
    self.polygon = Polygon()
    self.left_paren()
    while True:
      self.parse_loop()
      if not self.ate_comma():
        break
    self.right_paren()
    self.multi.polygons.append(self.polygon)
    self.polygon = None

  def parse_loop(self):
    self.left_paren()
    num_points = 0
    while self.ate_comma():
      num_points += 1
    self.right_paren()
    self.polygon.num_loops += 1

parser = Parser(input_text)
for index, multi_polygon in enumerate(parser.multi_polygons):
  print "multi_polygon %d has %d polygons" % (index, len (multi_polygon.polygons))
  for i, p in enumerate(multi_polygon.polygons):
    print "  polygon %d has %d loops" % (i, p.num_loops)
  `

在您的示例文本中運行此程序,輸出為:

multi_polygon 0 has 1 polygons
  polygon 0 has 1 loops
multi_polygon 1 has 1 polygons
  polygon 0 has 2 loops
multi_polygon 2 has 5 polygons
  polygon 0 has 1 loops
  polygon 1 has 1 loops
  polygon 2 has 1 loops
  polygon 3 has 1 loops
  polygon 4 has 1 loops
multi_polygon 3 has 7 polygons
  polygon 0 has 2 loops
  polygon 1 has 2 loops
  polygon 2 has 2 loops
  polygon 3 has 2 loops
  polygon 4 has 2 loops
  polygon 5 has 1 loops
  polygon 6 has 2 loops

您可以使用以下模式:

MULTIPOLYGON\(\(\(-?\d+ -?\d+(?:, -?\d+ -?\d+)*\),\(-?\d+ -?\d+(?:, -?\d+ -?\d+)*\)\)\)

模式元素說明:

\(       # literal parenthesis (must be escaped)
(?:..)   # non capturing group (useful to repeat the same pattern with * or +)
\d       # a digit
-?       # optional -

這可能會對您有所幫助,但已在Perl中進行了測試。

編輯您需要使用正則表達式處理工具來管理此類工作。
我使用了RegexFormat4,它使用了帶括號的字符串並進行了格式化以縮進。 然后,它逃脫了文字。 我將各個部分剪切粘貼在一起,分別壓縮,然后壓縮整個東西。 在Perl中進行測試,耗時約10分鍾。
您可以添加所需的任何其他組合,但是我建議使用該工具來處理此類問題。

祝好運!

 #  (MULTIPOLYGON\(\(\([^)]+\)\)\))|(MULTIPOLYGON\(\(\([^)]+\)(?:,\([^)]+\))+\)\))|(MULTIPOLYGON\(\(\([^)]+\)\)(?:,\(\([^)]+\)\))+\))|(MULTIPOLYGON\((?:(?:(?:(?<=\)),|)\(\([^)]+\)\))*(?:(?:(?<=\)),|)\(\([^)]+\)(?:,\([^)]+\))+\))+(?:(?:(?<=\)),|)\(\([^)]+\)\))*)+\))


    (  # (1)
         # A single multipolygon with no holes:
         #  MULTIPOLYGON\(\(\([^)]+\)\)\)
         MULTIPOLYGON
         \(
         \(
         \( [^)]+ \)
         \)
         \)
    )
 |  
    (  # (2)
         # A single multipolygon with one (or more) hole:
         #  MULTIPOLYGON\(\(\([^)]+\)(?:,\([^)]+\))+\)\)
         MULTIPOLYGON
         \(
         \(
         \( [^)]+  \)
         (?: , \( [^)]+  \) )+
         \)
         \)
    )
 |  
    (  # (3)
         # Several multipolygons with no holes:
         #  MULTIPOLYGON\(\(\([^)]+\)\)(?:,\(\([^)]+\)\))+\)
         MULTIPOLYGON
         \(
         \(
         \(  [^)]+  \)
         \)
         (?:
              ,
              \(
              \( [^)]+ \)
              \)
         )+
         \)
    )
 |  
    (  # (4)
         # Several multipolygons where at least one has a hole:
         #   MULTIPOLYGON\((?:(?:(?:(?<=\)),|)\(\([^)]+\)\))*(?:(?:(?<=\)),|)\(\([^)]+\)(?:,\([^)]+\))+\))+(?:(?:(?<=\)),|)\(\([^)]+\)\))*)+\)
         MULTIPOLYGON
         \(
         (?:
              (?:
                   (?:
                        (?<= \) )
                        ,
                     |  
                   )
                   \(
                   \(  [^)]+  \)
                   \)
              )*
              (?:
                   (?:
                        (?<= \) )
                        ,
                     |  
                   )
                   \(
                   \( [^)]+  \)
                   (?: , \( [^)]+  \) )+
                   \)
              )+
              (?:
                   (?:
                        (?<= \) )
                        ,
                     |  
                   )
                   \(
                   \( [^)]+ \)
                   \)
              )*
         )+
         \)
    )

Perl測試用例

 my $str = <DATA>;

 while ( $str =~ /(MULTIPOLYGON\(\(\([^)]+\)\)\))|(MULTIPOLYGON\(\(\([^)]+\)(?:,\([^)]+\))+\)\))|(MULTIPOLYGON\(\(\([^)]+\)\)(?:,\(\([^)]+\)\))+\))|(MULTIPOLYGON\((?:(?:(?:(?<=\)),|)\(\([^)]+\)\))*(?:(?:(?<=\)),|)\(\([^)]+\)(?:,\([^)]+\))+\))+(?:(?:(?<=\)),|)\(\([^)]+\)\))*)+\))/g )
 {

    if ( defined $1 )
         { print "1 - '$1'\n----------\n";  }
    if ( defined $2 )
         { print "2 - '$2'\n----------\n";  }
    if ( defined $3 )
         { print "3 - '$3'\n----------\n";  }
    if ( defined $4 )
         { print "4 - '$4'\n----------\n";  }
 }

 __DATA__

 A single multipolygon with no holes:
 MULTIPOLYGON(((11 -17, -8 -1, 14 -8, 18 -17, 3 -11, 0 18, -17 -12, -17 -10, -10 -13)))

 A single multipolygon with one hole:
 MULTIPOLYGON(((14 7, -12 18, -13 -12, -7 19, 0 -16, 11 16, 19 18),(5 -11, -12 1, 0 -4, 1 -1)))

 Several multipolygons with no holes:
 MULTIPOLYGON(((14 11, -14 -16, 16 18, 6 -8, -14 6, -20 -5, 12 -1, 2 -19, -15 10, 7 2)),((-16 -15, -16 -15, 19 -2, -2 -13, 3 -19, -16 14, -1 -20)),((0 2, -18 8, -20 17, 12 5, 13 17, -9 -8, -20 -8, 20 -6, -12 0, -9 -4, -5 -14, -16 -19)),((-15 -16, -14 2, 19 -18, 4 8, 18 -1, -2 -13)),((-10 9, -12 15, -16 20, -15 -13, -17 16, -11 3, 18 -13, -3 13, -6 1, 2 12)))

 Several multipolygons where at least one has a hole:
 MULTIPOLYGON(((-8 0, 10 -7, 15 0, 0 14, -11 -13, -15 -5),(-6 -19, -18 -8, -9 -18, -1 2, 10 -8, -1 -12, -9 -16)),((0 19, -8 -10, -12 -12, 15 -20, 9 9, 16 5),(8 4, 8 0, 2 7, -17 8, 13 17, -6 -7)),((-9 -1, 19 9, -15 -11, -4 -14, -3 18),(-4 -15, 7 8, 5 -6, 20 13, 0 7, -10 -18)),((19 0, -13 1, -10 -12, -8 7, -1 14, 17 11),(0 -10, -1 -20, -14 7)),((-20 -7, 3 -7, 15 2, -7 -7),(9 -18, 13 -2, -15 -8, -2 -9)),((-18 8, 4 15, -1 -12, -13 18, 8 -17, -14 -19, -7 -13, 1 2, -11 -15, 5 20, 12 -14, 4 -10, -17 8, -6 15, -18 15)),((15 -2, 14 2, 17 -2, 6 3, -16 2),(7 0, 17 10, 17 -17, 13 -3, 1 -8)))

輸出>>

 1 - 'MULTIPOLYGON(((11 -17, -8 -1, 14 -8, 18 -17, 3 -11, 0 18, -17 -12, -17 -10,
  -10 -13)))'
 ----------
 2 - 'MULTIPOLYGON(((14 7, -12 18, -13 -12, -7 19, 0 -16, 11 16, 19 18),(5 -11, -
 12 1, 0 -4, 1 -1)))'
 ----------
 3 - 'MULTIPOLYGON(((14 11, -14 -16, 16 18, 6 -8, -14 6, -20 -5, 12 -1, 2 -19, -1
 5 10, 7 2)),((-16 -15, -16 -15, 19 -2, -2 -13, 3 -19, -16 14, -1 -20)),((0 2, -1
 8 8, -20 17, 12 5, 13 17, -9 -8, -20 -8, 20 -6, -12 0, -9 -4, -5 -14, -16 -19)),
 ((-15 -16, -14 2, 19 -18, 4 8, 18 -1, -2 -13)),((-10 9, -12 15, -16 20, -15 -13,
  -17 16, -11 3, 18 -13, -3 13, -6 1, 2 12)))'
 ----------
 4 - 'MULTIPOLYGON(((-8 0, 10 -7, 15 0, 0 14, -11 -13, -15 -5),(-6 -19, -18 -8, -
 9 -18, -1 2, 10 -8, -1 -12, -9 -16)),((0 19, -8 -10, -12 -12, 15 -20, 9 9, 16 5)
 ,(8 4, 8 0, 2 7, -17 8, 13 17, -6 -7)),((-9 -1, 19 9, -15 -11, -4 -14, -3 18),(-
 4 -15, 7 8, 5 -6, 20 13, 0 7, -10 -18)),((19 0, -13 1, -10 -12, -8 7, -1 14, 17
 11),(0 -10, -1 -20, -14 7)),((-20 -7, 3 -7, 15 2, -7 -7),(9 -18, 13 -2, -15 -8,
 -2 -9)),((-18 8, 4 15, -1 -12, -13 18, 8 -17, -14 -19, -7 -13, 1 2, -11 -15, 5 2
 0, 12 -14, 4 -10, -17 8, -6 15, -18 15)),((15 -2, 14 2, 17 -2, 6 3, -16 2),(7 0,
  17 10, 17 -17, 13 -3, 1 -8)))'
 ----------

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM