I'm trying to load data from MapR DB into Spark DF. Then I'm just trying to export the DF to CSV files. But, I'm getting error is:
"com.mapr.db.spark.exceptions.SchemaMappingException: Failed to parse a value for data type NullType (current token: STRING)"
I tried couple of ways by casting the column to StringType. This is one of them:
df = spark.loadFromMapRDB(db_table).select(
F.col('c_002.v_22').cast(T.StringType()).alias('aaa'),
F.col('c_002.v_23').cast(T.StringType()).alias('bbb')
)
print(df.printSchema())
Output of PrintSchema:
root
|-- aaa: string (nullable = true)
|-- bbb: string (nullable = true)
Values in column 'aaa' & 'bbb' can be null. Then I'm trying to export the df to CSV files:
df = df.repartition(10)
df.write.csv(csvFile, compression='gzip', mode='overwrite', sep=',', header='true', quoteAll='true')
I was getting a samilar issue with a MapR-DB JSON table and I was able to resolve by defining the table schema when loading into a DataFrame:
tableSchema = StructType([
StructField("c_002.v_22", StringType(), True), # True here signifies nullable: https://spark.apache.org/docs/2.3.1/api/python/pyspark.sql.html?highlight=structfield#pyspark.sql.types.StructField
StructField("c_002.v_23", StringType(), True),
])
df = spark.loadFromMapRDB(db_table, tableSchema ).select(
F.col('c_002.v_22').alias('aaa'),
F.col('c_002.v_23').alias('bbb')
)
Another thing you could try is simply filling the null values with something: https://spark.apache.org/docs/2.3.1/api/python/pyspark.sql.html#pyspark.sql.DataFrame.fillna
df = df.na.fill('null')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.