简体   繁体   中英

scala regex filter wrapped elements

I have input in form of Strings like the following

ellipse {
    attribute = foo

    ellipse {
        attribute = foo
        line {
            attribute = foo
            attribute = foo
            attribute = foo
        }
        line {
            attribute = foo
            attribute = foo
        }
    }
}

Basically it is about 2d-elements, that are able to hold other 2d-elements inside them. My task is to write a regex, that can seperate parent-elements from their children, so they can be parsed seperately. In the case of:

rectangle1{
    attribute = foo
}
ellipse1{
    attribute = foo
    ellipse{
        rectangle{
            attribute = foo
        }
    }
}

I want to be able to regex.findAllIn(string) and then have only the rectangle1 and the ellipse1 strings, so I can parse them. Im no expert with regexes but I made an attempt, which fails of course:

I tried to:

(?s)(?!((ellipse|point|line) \\\\{)).+ (ellipse|point|line) \\\\{.*\\\\}

get all the ellipses or points or lines, which

(?s)(?!((ellipse|point|line) \\\\{)).+(ellipse|point|line) \\\\{.*\\\\}

include something, but

(?s) (?!( (ellipse|point|line) \\\\{)).+(ellipse|point|line) \\\\{.*\\\\}

don't

(?s) (?!((ellipse|point|line) \\\\{)) .+(ellipse|point|line) \\\\{.*\\\\}

have something like 'ellipse {' or 'point {' above them,

but this doesnt work...

There most likely is a way to do what I want, but as I said I'm not an expert with regexes. If you have an answer for me, I would be very grateful for an explanation, since I would like to understand the solution. Thank you in advance!

Pure regex are not very good fit for this task. You have to use recursive regex, and Java (and hence Scala) currently don't support them.

However, as you are using Scala, you can take advantage of powerful Parser Combinator library:

object ParserCombinator extends App with JavaTokenParsers with PackratParsers {

  case class Attr(value:String)
  case class Fig2d(name:String, attrs:List[Attr], children:List[Fig2d])

  def fig2d:Parser[Fig2d] = (ident <~ "{") ~ rep(attr) ~ (rep(fig2d) <~ "}") ^^ {
    case name ~ attrs ~ children => Fig2d(name, attrs, children)
  }

  def attr:Parser[Attr] = "attribute" ~> "=" ~> "\\S+".r ^^ Attr.apply

  def fig2dList = rep(fig2d)

  val input =
    """
      |rectangle1{
      |    attribute = foo
      |}
      |ellipse1{
      |    attribute = foo
      |    ellipse{
      |        rectangle{
      |            attribute = foo
      |        }
      |    }
      |}
    """.stripMargin


  println(parseAll(fig2dList, input))
}

Prints:

 [13.5] parsed: List(Fig2d(rectangle1,List(Attr(foo)),List()), Fig2d(ellipse1,List(Attr(foo)),List(Fig2d(ellipse,List(),List(Fig2d(rectangle,List(Attr(foo)),List()))))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM