简体   繁体   中英

Unable to load hive table from spark dataframe with more than 25 columns in hdp 3

We were trying to populate hive table from spark shell. Dataframe with 25 columns got successfully added to the hive table using hive warehouse connector. But for more than this limit we got below error:

Caused by: java.lang.IllegalArgumentException: Missing required char ':' at 'struct<_c0:string,_c1:string,_c2:string,_c3:string,_c4:string,_c5:string,_c6:string,_c7:string,_c8:string,_c9:string,_c10:string,_c11:string,_c12:string,_c13:string,_c14:string,_c15:string,_c16:string,_c17:string,_c18:string,_c19:string,_c20:string,_c21:string,_c22:string,_c23:string,...^ 2 more fields>'
  at org.apache.orc.TypeDescription.requireChar(TypeDescription.java:293)

Below is the sample input file data (input file is of type csv).

|col1                |col2 |col3 |col4               |col5    |col6           |col7       |col8    |col9    |col10   |col11   |col12   |col13   |col14   |col15   |col16 |col17|col18                                        |col19   |col20  |col21    |col22    |col23    |col24                               |col25|col26     |
|--------------------|-----|-----|-------------------|--------|---------------|-----------|--------|--------|--------|--------|--------|--------|--------|--------|------|-----|---------------------------------------------|--------|-------|---------|---------|---------|------------------------------------|-----|----------|
|11111100000000000000|CID81|DID72|2015-08-31 00:17:00|null_val|919122222222222|1627298243 |null_val|null_val|null_val|null_val|null_val|null_val|Download|null_val|Mobile|NA   |x-nid:xyz<-ch-nid->N4444.245881.ABC-119490111|12452524|1586949|sometext |sometext |sometext1|8b8d94af-5407-42fa-9c4f-baaa618377c8|Click|2015-08-31|
|22222200000000000000|CID82|DID73|2015-08-31 00:57:00|null_val|919122222222222|73171145211|null_val|null_val|null_val|null_val|null_val|null_val|Download|null_val|Tablet|NA   |x-nid:xyz<-ch-nid->N4444.245881.ABC-119490111|12452530|1586956|88200211 |88200211 |sometext2|9b04580d-1669-4eb3-a5b0-4d9cec422f93|Click|2015-08-31|
|33333300000000000000|CID83|DID74|2015-08-31 00:17:00|null_val|919122222222222|73171145211|null_val|null_val|null_val|null_val|null_val|null_val|Download|null_val|Laptop|NA   |x-nid:xyz<-ch-nid->N4444.245881.ABC-119490111|12452533|1586952|sometext2|sometext2|sometext3|3ab8511d-6f85-4e1f-8b11-a1d9b159f22f|Click|2015-08-31|

Spark shell was instantiated using below command:

spark-shell --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar --conf spark.hadoop.metastore.catalog.default=hive --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://sandbox-hdp.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=raj_ops"

Version of HDP is 3.0.1

Hive table was created using below command:

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.createTable("tablename").ifNotExists().column()...create()

Data was saved using below command:

df.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector").option("table", "tablename").mode("append").save()

Kindly help us on this.

Thank you in advance.

I faced this problem, after thoroughly examining the source code of the following classes:

  • org.apache.orc.TypeDescription
  • org.apache.spark.sql.types.StructType
  • org.apache.spark.util.Utils

I found out that the culprit was the variable DEFAULT_MAX_TO_STRING_FIELDS inside the class org.apache.spark.util.Utils :

/* The performance overhead of creating and logging strings for wide schemas can be large. To limit the impact, we bound the number of fields to include by default. This can be overridden by setting the 'spark.debug.maxToStringFields' conf in SparkEnv. */

val DEFAULT_MAX_TO_STRING_FIELDS = 25

so, after setting this property, by example: conf.set("spark.debug.maxToStringFields", "128"); in my application, the issue has gone.

I hope it can help others.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM