Cannot show or access pyspark dataframe

Question

I am new to pySpark. I tried to create a data frame, but I get a error.

adjustedDf = crime_mongodb_df.withColumn("Reported Date", to_date(col("Reported Date"), "d/MM/yyyy")).withColumn('year', year("Reported Date"))
yearGroup = adjustedDf.groupBy("year").sum("Offence Count")
yearGroup.printSchema()
yearGroup.show()

The schema can be printed:

root
 |-- year: integer (nullable = true)
 |-- sum(Offence Count): long (nullable = true)

Whentry to show or access yearGroup I get a error:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-113-c8b6150ea8cc> in <module>
      4 yearGroup = adjustedDf.groupBy("year").sum("Offence Count")
      5 yearGroup.printSchema()
----> 6 yearGroup.show()
      7 
      8 years = sum(yearGroup.select("year").toPandas().values.tolist(),[])

~/FIT5202/jupyter/lib/python3.6/site-packages/pyspark/sql/dataframe.py in show(self, n, truncate, vertical)
    378         """
    379         if isinstance(truncate, bool) and truncate:
--> 380             print(self._jdf.showString(n, 20, vertical))
    381         else:
    382             print(self._jdf.showString(n, int(truncate), vertical))

~/FIT5202/jupyter/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

It is strange. My data has 60,000 rows. If I try to use first 800 rows, then it works.

Can I get some help?

Thanks

Answer 1

Found the solution. SHould remove null values. yearGroup = yearGroup.filter(adjustedDf.year.isNotNull())

Cannot show or access pyspark dataframe

Question

1 answers

solution1
-1 2019-09-05 14:57:29

Cannot show or access pyspark dataframe

Question

1 answers

solution1 -1 2019-09-05 14:57:29

solution1
-1 2019-09-05 14:57:29