简体   繁体   English

Scala正则表达式过滤器包装的元素

[英]scala regex filter wrapped elements

I have input in form of Strings like the following 我输入的字符串形式如下所示

ellipse {
    attribute = foo

    ellipse {
        attribute = foo
        line {
            attribute = foo
            attribute = foo
            attribute = foo
        }
        line {
            attribute = foo
            attribute = foo
        }
    }
}

Basically it is about 2d-elements, that are able to hold other 2d-elements inside them. 基本上,它是关于2d元素的,它们能够在其中容纳其他2d元素。 My task is to write a regex, that can seperate parent-elements from their children, so they can be parsed seperately. 我的任务是编写一个正则表达式,可以将父元素与子元素分开,以便可以分别解析它们。 In the case of: 如果是:

rectangle1{
    attribute = foo
}
ellipse1{
    attribute = foo
    ellipse{
        rectangle{
            attribute = foo
        }
    }
}

I want to be able to regex.findAllIn(string) and then have only the rectangle1 and the ellipse1 strings, so I can parse them. 我希望能够regex.findAllIn(string)然后只有矩形1和ellipse1字符串,所以我可以解析它们。 Im no expert with regexes but I made an attempt, which fails of course: 我不是使用正则表达式的专家,但我进行了尝试,但是失败了:

I tried to: 我试过了:

(?s)(?!((ellipse|point|line) \\\\{)).+ (ellipse|point|line) \\\\{.*\\\\} (?s)(?!((椭圆|点|线)\\\\ {))。+ (椭圆|点|线) \\\\ {。* \\\\}

get all the ellipses or points or lines, which 得到所有的椭圆或点或线,

(?s)(?!((ellipse|point|line) \\\\{)).+(ellipse|point|line) \\\\{.*\\\\} (?s)(?!((椭圆|点|线)\\\\ {))。+(椭圆|点|线) \\\\ {。* \\\\}

include something, but 包括一些东西,但是

(?s) (?!( (ellipse|point|line) \\\\{)).+(ellipse|point|line) \\\\{.*\\\\} (?s) (?!( (椭圆|点|线)\\\\ {))。+(椭圆|点|线)\\\\ {。* \\\\}

don't

(?s) (?!((ellipse|point|line) \\\\{)) .+(ellipse|point|line) \\\\{.*\\\\} (?s) (?!((椭圆|点|线)\\\\ {)) 。+(椭圆|点|线)\\\\ {。* \\\\}

have something like 'ellipse {' or 'point {' above them, 在其上方有“椭圆{”“点{”之类的内容,

but this doesnt work... 但这不起作用...

There most likely is a way to do what I want, but as I said I'm not an expert with regexes. 最有可能做我想要的事情,但是正如我所说,我不是正则表达式专家。 If you have an answer for me, I would be very grateful for an explanation, since I would like to understand the solution. 如果您有我的答案,我将不胜感激,因为我想了解解决方案。 Thank you in advance! 先感谢您!

Pure regex are not very good fit for this task. 纯正则表达式不太适合此任务。 You have to use recursive regex, and Java (and hence Scala) currently don't support them. 您必须使用递归正则表达式,而Java(因此也包括Scala)目前不支持它们。

However, as you are using Scala, you can take advantage of powerful Parser Combinator library: 但是,在使用Scala时,您可以利用强大的Parser Combinator库:

object ParserCombinator extends App with JavaTokenParsers with PackratParsers {

  case class Attr(value:String)
  case class Fig2d(name:String, attrs:List[Attr], children:List[Fig2d])

  def fig2d:Parser[Fig2d] = (ident <~ "{") ~ rep(attr) ~ (rep(fig2d) <~ "}") ^^ {
    case name ~ attrs ~ children => Fig2d(name, attrs, children)
  }

  def attr:Parser[Attr] = "attribute" ~> "=" ~> "\\S+".r ^^ Attr.apply

  def fig2dList = rep(fig2d)

  val input =
    """
      |rectangle1{
      |    attribute = foo
      |}
      |ellipse1{
      |    attribute = foo
      |    ellipse{
      |        rectangle{
      |            attribute = foo
      |        }
      |    }
      |}
    """.stripMargin


  println(parseAll(fig2dList, input))
}

Prints: 打印:

 [13.5] parsed: List(Fig2d(rectangle1,List(Attr(foo)),List()), Fig2d(ellipse1,List(Attr(foo)),List(Fig2d(ellipse,List(),List(Fig2d(rectangle,List(Attr(foo)),List()))))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM