Values are coming null for all the columns in spark scala dataframe

Question

I have below dataset as input

816|27555832600|01|14|25|  |  
825|54100277425|14|03|25|15|  
9003|54100630574|  |  |  |  |  
809|51445926423|12|08|25|17|

getting below as output:

null|null|null|null|null|null|
825|54100277425|  14|   3|  25|  15|
null|null|null|null|null|null|
809|51445926423|  12|   8|  25|  17|

expected output

816|27555832600|01|14|25|null|  
825|54100277425|14|03|25|15|  
9003|54100630574|null|null|null|null|  
809|51445926423|12|08|25|17|

I have tried the below code to load the.txt or.bz2 file.

val dataset = sparkSession.read.format(formatType)
        .option("DELIMITER", "|"))
        .schema(schema_new)
        .csv(dataFilePath)

Answer 1

I tried your problem statement. I am using Spark 3.0.1 version to solve this use case. It working as expected. try below code snippet.

val sampleDS = spark.read.options(Map("DELIMITER"->"|")).csv("D:\\DataAnalysis\\DataSample.csv")
sampleDS.show()

Output ->
+----+-----------+---+---+---+---+---+
| _c0|        _c1|_c2|_c3|_c4|_c5|_c6|
+----+-----------+---+---+---+---+---+
| 816|27555832600| 01| 14| 25|   |   |
| 825|54100277425| 14| 03| 25| 15|   |
|9003|54100630574|   |   |   |   |   |
| 809|51445926423| 12| 08| 25| 17|   |
+----+-----------+---+---+---+---+---+

Consider if your having a blank line in input data.

Input data after adding blank line

816|27555832600|01|14|25|  |  
825|54100277425|14|03|25|15|  
9003|54100630574|  |  |  |  |  
||||
809|51445926423|12|08|25|17|

After reading data, you can simply use sampleDS.na.drop.show() to remove blank or null data.

Please note that, if you are having only blank line, then Spark does not consider in dataframe. Spark removes blank line while reading itself.

Values are coming null for all the columns in spark scala dataframe

Question

expected output

1 answers

solution1
1 ACCPTED 2020-12-21 07:08:31

Values are coming null for all the columns in spark scala dataframe

Question

expected output

1 answers

solution1 1 ACCPTED 2020-12-21 07:08:31

solution1
1 ACCPTED 2020-12-21 07:08:31