简体   繁体   中英

Initializing Scala regex in PySpark for Scala class constructor

I am working in a Jupyter Notebook with PySpark v2.3.4 which runs on Java 8, Python 3.6 (with py4j==0.10.7), and Scala 2.11, and I have a Scala case class that takes in a scala.util.matching.Regex ( scala doc ) as an arg like so:

case class myClass(myString: String, myRegex: Regex) 

I would like to construct an object from myClass but I can't seem to figure out how to construct a scala.util.matching.Regex object in a Python / PySpark environment. Below are a couple of my attempts/docs I've followed to create a Scala regex where sc is my SparkContext.

  • sc._jvm.scala.util.matching.Regex("""(S|s)cala""")
    • Error: Constructor scala.util.matching.Regex([class java.lang.String]) does not exist
    • This error message dumbfounds me because the Scala 2.11 docs clearly state that its constructor takes in a java.lang.String .
  • sc._jvm.scala.util.matching.Regex("(S|s)cala")
    • Same error as above
  • sc._jvm.scala.util.matching.Regex(r"(S|s)cala")
    • Same error as above
  • sc._jvm.scala.util.matching.Regex("(S|s)cala".r) (the way they do it in Scala)
    • Error: Python string does not have attribute "r"
  • sc._jvm.java.util.regex.Pattern.compile("(S|s)cala") successfully creates a Java regex pattern -- and the scala doc clearly states that the Scala regex delegates to the Java regex package...

Any help/advice would be much appreciated! Thanks in advance!

I figured it out lol

Scala Regex takes a second argument called groupNames . It is a variable String argument that should accept args of length 0 to as many String arguments as you want. However, from a Python interpreter, this is seen as a required argument, so you must pass a None into this argument to indicate nothing to populate this second arg.

sc._jvm.scala.util.matching.Regex("(S|s)cala", None)

NOTE: I haven't figured out how to pass in a vararg yet though... Passing comma separated strings, array, and tuples didn't work... Any help on that would be great thanks:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM