I have been experimenting with multiple spark sessions using the same spark context. Once the spark-shell is launched, I can create a new spark sessio ...
I have been experimenting with multiple spark sessions using the same spark context. Once the spark-shell is launched, I can create a new spark sessio ...
I'm using delta lake on using pyspark by submitting below command System Specs: Spark - 3.0.3 scala - 2.12.10 java - 1.8.0 hadoop - 2.7 ...
I am new to spark shell and I am trying to add new table and read it. I have added this file: workers.txt: and run the commands: but as you see ...
spark-shell throws NoSuchMethodException if I define a class in REPL and then call newInstance via reflection. But the same code works fine in nati ...
I installed Apache Spark, have java and python installed as well. Set up the environment variables as per this article: https://phoenixnap.com/kb/inst ...
I regularly copy blocks of code into spark-shell and run the block using :paste ctrl-d Sometimes it errors because another line of code is required ...
I am trying to display just a few columns in Scala like just name, address and zip I have this so far... But can't just get to display only 3 colu ...
I am migrating pig script to pyspark and I am new to Pyspark so I am stuck at data loading. My pig script looks like: Bag1 = LOAD '/refined/em/em_re ...
I have Quaterly basis Data and Data keeps Growing dynamically as Quater Grows- When the Number of Quarters are less i am manually editing the query ...
I created the same Spark Dataframe in 2 ways in order to run Spark SQL on it. 1. I read the data from a .csv file straight into a Dataframe in Spark ...
This is a general question but I am hoping someone can answer it. I am comparing query execution times between MongoDB and Spark SQL. Specifically I h ...
I am using the MongoDB Spark Connector to import data from MongoDB and then perform some SQL queries. I will describe the whole process before getting ...
I am a complete beginner to all this stuff in general so pardon if I'm missing some totally obvious step. I installed spark 3.1.2 and cassandra 3.11.1 ...
I have CSV file as shown: While loading the data, by default all the columns are loading as strings. So I defined a custom schema as String, Int ...
I'm getting an error while running spark-shell command through cmd but unfortunately without any luck so far. I have Python/Java/Spark/Hadoop(winutils ...
Facing below error while starting spark-shell with yarn master. Shell is working with spark local master. Below is spark-defaults.conf spark ver ...
I am using s3a to read from database into dataframe and write to .parquet(s3a://bucketname//folder). It works for <100 column dataframe but crashes ...
I'm using a dataset and apparently it has "double quotes" wrapped around each row. I can't see it as it opens with Excel by default when I use my brow ...
Why SparkHadoopUtil is not accessible here whereas is accessible in lower version of spark even though they are imported? ...
I have not faced this problem with any of other software on mysystem. Able to install and run everything in window terminal/command prompt and Git-Bas ...