简体   繁体   中英

How to exclude group of characters in python

I wanna write a script that returns digits with power of 1. User's inputs are quadratic and normal digits. what I want is described below:

input = "+2**5+3+4**8-7"
Output = "3,-7"

I tried regex re.findall(r"[+-]?[0-9]+[^[*][*][2]]", input) but it doesn't work Thanks in advance:)

You need a negative look-around assertions , and add boundary anchors:

r'(?<!\*\*)-?\b\d+\b(?!\*\*)'

The (?<....) syntax only matches at positions where the text before it doesn't match the pattern. Similarly, the (?....) syntax does the same for following text. Together they ensure you only match numbers that are not exponents (follow ** ) and not have an exponent (followed by ** ).

The \b boundary anchor only matches at the start or end of a string, and anywhere there's a word character followed by a non-word character or vice versa (so in between \w\W or \W\w , where \w happily includes digits but not arithmetic characters):

>>> import re
>>> input = "+2**5+3+4**8-7"
>>> re.findall(r'(?<!\*\*)-?\b\d+\b(?!\*\*)', input)
['3', '-7']

Note that I used \d to match digits, and removed the + from the pattern, since you don't want that in your expected output.

You can play with the expression in the online regex101 demo ; eg you can try it with numbers > 10 and using a single * for multiplication.

If you must support negative exponents, then the above won't suffice as ...**-42 has 42 match without ** preceding the digits. In that case an extra negative look-behind before the -? that disallows **- is needed:

r'(?<!\*\*)-?(?<!\*\*-)\b\d+\b(?!\*\*)'

(Thanks to Casimir eg Hippolyte for points my this out and suggesting a solution for it).

However, at this point I'd suggest you switch to just parsing the expression into an abstract syntax tree and then walking the tree to extract the operands that are not part of an exponent:

import ast

class NumberExtractor(ast.NodeVisitor):
    def __init__(self):
        self.reset()

    def reset(self):
        self.numbers = []

    def _handle_number(self, node):
        if isinstance(node, ast.Constant):
            if isinstance(node.value, (int, float, complex)):
                return node.value
        elif isinstance(node, ast.Num):
            return node.n

    def visit_UnaryOp(self, node):
        if isinstance(node.op, (ast.UAdd, ast.USub)):
            operand = self._handle_number(node.operand)
            if operand is None:
                return
            elif isinstance(node.op, UAdd):
                self.numbers.append(+operand)
            else:
                self.numbers.add(-operand)

    def visit_Constant(self, node):
        if isinstance(node.value, (int, float, complex)):
            self.numbers.append(node.value)

    def visit_Num(self, node):
        self.numbers.append(node.n)

    def visit_BinOp(self, node):
        if isinstance(node.op, ast.Pow):
            return  # ignore exponentiation
        self.generic_visit(node)  # process the rest

def extract(expression):
    try:
        tree = ast.parse(expression, mode='eval')
    except SyntaxError:
        return []
    extractor = NumberExtractor()
    extractor.visit(tree)
    return extractor.numbers

This extracts just the numbers; subtraction won't produce a negative number:

>>> input = "+2**5+3+4**8-7"
>>> extract(input)
[3, 7]

Moreover, it can handle arbitrary amounts of whitespace, and much more complex expressions than a regex could ever handle:

>>> extract("(10 + 15) * 41 ** (11 + 19 * 17) - 42")
[10, 15, 42]
re.findall(r"(?<!\*\*)(?<!\*\*[+-])[+-]?\b[0-9]++(?!\*\*)", input)

(?!\*\*) is a negative lookahead that makes sure we haven't 2 * after digits.

re doesn't support posssessive quantifiers, you have to use PyPi regex

Demo

You could write a parser and check whatever you need. I know it is a bit long, but fun:)

$ cat lexer.py
import re
from collections import namedtuple

tokens = [
    r'(?P<TIMES>\*)',
    r'(?P<POW>(\+|-)?\d+\*\*\d+)',
    r'(?P<NUM>(\+|-)?\d+)'
    ]

master_re = re.compile('|'.join(tokens))
Token = namedtuple('Token', ['type','value'])
def tokenize(text):
    scan = master_re.scanner(text)
    return (Token(m.lastgroup, m.group())
            for m in iter(scan.match, None))

x = '+2**5+3+4**8-7'

required = []
for tok in tokenize(x):
  if tok.type == 'POW':
      coeff, exp = tok.value.split('**')
      if exp == '1':
          required.append(coeff)
  elif tok.type == 'NUM':
      required.append(tok.value)

print(required)

Output:

$ python lexer.py
['+3', '-7']

You can try this simple regex expression

re.findall(r'[-\+]\d(?!\*\*)', search_data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM