简体   繁体   中英

Finding repeating operands using regex - Python

I'm trying to find through a file expressions such as A*B .

A and B could be anything from [AZ] [az] [0-9] and may include < > ( ) [ ] _ . etc. but not commas , semicolon , whitespace , newline or any other arithmetic operator (+ - \\ *) . These are the 8 delimiters. Also there can be spaces between A and * and B. Also the number of opening brackets need to be the same as closing brackets in A and B.

I unsuccessfully tried something like this (not taking into account operators inside A and B):

import re
fp = open("test", "r")
for line in fp:
    p = re.compile("( |,|;)(.*)[*](.*)( |,|;|\n)")
    m = p.match(line)
        if m:
            print 'Match found ',m.group()
        else:
            print 'No match'

Example 1:

(A1 * B1.list(), C * D * E) should give 3 matches:

  1. A1 * B1.list()
  2. C * D
  3. D * E

An extension to the problem statement could be that, commas , semicolon , whitespace , newline or any other arithmetic operator (+ - \\ *) are allowed in A and B if inside backets:

Example 2:

(A * B.max(C * D, E)) should give 2 matches:

  1. A * B.max(C * D, E)
  2. C * D

I'm new to regular expressions and curious to find a solution to this.

Regular expressions have limits. The border between regular expressions and text parsing can be tight. IMO, using a parser is a more robust solution in your case.

The examples in the question suggest recursive patterns. A parser is again superior than a regex flavor in this area.

Have a look to this proposed solution: Equation parsing in Python .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM