Any help would be very much appreciated.
I am trying to build a dataframe using data from mongodb.
val spark = SparkSession.builder()
.master("local")
.appName("app")
.config("spark.mongodb.input.uri", uri)
.config("spark.mongodb.input.collection", "collectionName")
.config("spark.mongodb.input.readPreference.name", "secondary")
.getOrCreate()
val df = MongoSpark.load(spark).limit(1)
and from there i'm trying to read elements row by row, and the schema of the dataframe looks something like this:
root
|-- A: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- B: boolean (nullable = true)
|-- C: string (nullable = true)
|-- D: string (nullable = true)
|-- E: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- a: string (nullable = true)
| | |-- b: string (nullable = true)
| | |-- c: string (nullable = true)
| | |-- d: string (nullable = true)
if the dataframe does not include E, dataframe.show() would print out just fine.
However, if the dataframe does inlcude E, then dataframe.show() would give me
Cannot cast STRING into a StructType(StructField(a,StringType,true), StructField(b,StringType,true), StructField(c,StringType,true), StructField(d,StringType,true)) (value: BsonString{value='http://...some url...'})
I tried pretty much every solution related to this problem listed on stackoverflow, but I'm still having no luck passing this error.
How should I approach this problem? Thank you!
E is actually an array of objects that contains multiple strings.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.