简体   繁体   中英

Scala | Spark | Invoking undefined method

I am new to Scala and trying to grab the language fundamentals. I have working knowledge of Spark with Java API.

I have some hard time understanding some scala code and therfore I am not able to write the same in Java. I got this piece of code in https://docs.microsoft.com/en-us/azure/cosmos-db/spark-connector

// Import Necessary Libraries
import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark._
import com.microsoft.azure.cosmosdb.spark.config.Config

// Read Configuration
val readConfig = Config(Map(
  "Endpoint" -> "https://doctorwho.documents.azure.com:443/",
  "Masterkey" -> "YOUR-KEY-HERE",
  "Database" -> "DepartureDelays",
  "Collection" -> "flights_pcoll",
  "query_custom" -> "SELECT c.date, c.delay, c.distance, c.origin, c.destination FROM c WHERE c.origin = 'SEA'" // Optional
))

// Connect via azure-cosmosdb-spark to create Spark DataFrame
val flights = spark.read.cosmosDB(readConfig)
flights.count()

As far as I know the read method returns an object of type org.apache.spark.sql.DataFrameReader and this does not have any method cosmosDB() , then how this code is working. Also how do I convert this code to Java.

Thank You

What you are seeing is the magic of Scala implicit conversions. The compiler sees that you intend to call the cosmosDB method of a DataFrameReader and that there's no method of that name with the proper signature, as you note.

When you

import com.microsoft.azure.cosmosdb.spark.schema._

you also import the contents of the package object (current git commit as of this writing, last updated in 2017 so it's stable code). The relevant bit that gets imported is

implicit def toDataFrameReaderFunctions(dfr: DataFrameReader): DataFrameReaderFunctions

An implicit def which takes one argument signals to the compiler that, if this def is in scope, the compiler can insert a call to this method if:

  • it has a DataFrameReader
  • a method is being called which is not a member of DataFrameReader
  • com.microsoft.azure.cosmosdb.spark.schema.DataFrameReaderFunctions has member with the desired name and signature

Since DataFrameReaderFunctions has a method cosmosDB , the compiler then translates your code to

toDataFrameReaderFunctions(spark.read).cosmosDB(readConfig)

This general approach of using an implicit conversion to make it look like you're adding methods to a type without modifying the type is called enrichment or an extension method. Implicit conversions in general should probably be avoided: they very often make code hard to follow and an errant implicit conversion in scope can make code you don't intend to compile compile. For an enrichment like this, there's an alternative: use an implicit class , where the compiler essentially autogenerates the implicit conversion but this doesn't allow you to use an Int in place of a String .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM