简体   繁体   中英

-Spark Scala Mongodb- MongoTypeConversionException Cannot cast STRING into a StructType(…)

Any help would be very much appreciated.

I am trying to build a dataframe using data from mongodb.

val spark = SparkSession.builder()
      .master("local")
      .appName("app")
      .config("spark.mongodb.input.uri", uri)
      .config("spark.mongodb.input.collection", "collectionName")
      .config("spark.mongodb.input.readPreference.name", "secondary")
      .getOrCreate()

val df = MongoSpark.load(spark).limit(1)

and from there i'm trying to read elements row by row, and the schema of the dataframe looks something like this:

root
 |-- A: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- B: boolean (nullable = true)
 |-- C: string (nullable = true)
 |-- D: string (nullable = true)
 |-- E: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- a: string (nullable = true)
 |    |    |-- b: string (nullable = true)
 |    |    |-- c: string (nullable = true)
 |    |    |-- d: string (nullable = true)

if the dataframe does not include E, dataframe.show() would print out just fine.

However, if the dataframe does inlcude E, then dataframe.show() would give me

Cannot cast STRING into a StructType(StructField(a,StringType,true), StructField(b,StringType,true), StructField(c,StringType,true), StructField(d,StringType,true)) (value: BsonString{value='http://...some url...'})

I tried pretty much every solution related to this problem listed on stackoverflow, but I'm still having no luck passing this error.

How should I approach this problem? Thank you!

E is actually an array of objects that contains multiple strings.

example of mongodb document

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM