I have a CSV
file that I have in HDFS
. I am using the latest version of Spark
and Python 3.7
. How can I make it such that I can visualize the CSV
?
I tried the following sample code:
from pyspark.sql.functions import avg
mydataframe = spark.read.csv("/diamonds.csv", header="true", inferSchema="true")
display(mydataframe.select("color","price").groupBy("color").agg(avg("price")))
The issue is, all I see in the output is text that looks like the schema of the mydataframe as opposed to an actual chart or visualization.
There is a column for 'latitude' and 'longitude' that I would like to use to display on a map. How can I do that?
Have you considered using python modules designed for geographic visualizations like geopandas?
import geopandas as gpd
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip( mydataframe["LONGITUDE"], mydataframe["LATITUDE"])]
gdf = gpd.GeoDataFrame(mydataframe, geometry=geometry)
gdf.plot()
See Application GeoPandas and Spark for further details.
I had a very similar problem which I worked on with data bricks platform. In a nutshell the idea looks like this:
For any dataset which is huge enough to cause troubles on browser, I would suggest to rollup data in usable fashion yourself.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.