How to visualize data on a map using python in spark that came from csv?

Question

I have a CSV file that I have in HDFS . I am using the latest version of Spark and Python 3.7 . How can I make it such that I can visualize the CSV ?

I tried the following sample code:

from pyspark.sql.functions import avg

mydataframe = spark.read.csv("/diamonds.csv", header="true", inferSchema="true")

display(mydataframe.select("color","price").groupBy("color").agg(avg("price")))

The issue is, all I see in the output is text that looks like the schema of the mydataframe as opposed to an actual chart or visualization.

There is a column for 'latitude' and 'longitude' that I would like to use to display on a map. How can I do that?

Answer 1

Have you considered using python modules designed for geographic visualizations like geopandas?

import geopandas as gpd
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip( mydataframe["LONGITUDE"], mydataframe["LATITUDE"])]
gdf = gpd.GeoDataFrame(mydataframe, geometry=geometry)
gdf.plot()

See Application GeoPandas and Spark for further details.

Answer 2

I had a very similar problem which I worked on with data bricks platform. In a nutshell the idea looks like this:

Use OpenStreetMaps with leafletjs to render map with overlays. Use displayHTML function from databricks notebook to render the HTML.
The markers are used to represent information on map.
For small datasets its not a problem to render a few markers. But for large datasets rendering is huge problem, with issues like browser being stuck, this could be circumvented by using markercluster . Marker cluster allows for a drilldown like functionality.

For any dataset which is huge enough to cause troubles on browser, I would suggest to rollup data in usable fashion yourself.

How to visualize data on a map using python in spark that came from csv?

Question

2 answers

solution1
1 ACCPTED 2019-08-14 11:24:59

solution2
0 2019-08-15 21:09:16

How to visualize data on a map using python in spark that came from csv?

Question

2 answers

solution1 1 ACCPTED 2019-08-14 11:24:59

solution2 0 2019-08-15 21:09:16

solution1
1 ACCPTED 2019-08-14 11:24:59

solution2
0 2019-08-15 21:09:16