Tag[spark-shell] Recent Newest Questions

How to change spark session in spark shell?

I have been experimenting with multiple spark sessions using the same spark context. Once the spark-shell is launched, I can create a new spark sessio ...

spark-sql/spark-submit with delta lake is resulting null pointer exception (at org.apache.spark.storage.BlockManagerMasterEndpoint)

I'm using delta lake on using pyspark by submitting below command System Specs: Spark - 3.0.3 scala - 2.12.10 java - 1.8.0 hadoop - 2.7 ...

How to read new table with data in spark-shell SQL?

I am new to spark shell and I am trying to add new table and read it. I have added this file: workers.txt: and run the commands: but as you see ...

Why spark-shell throws NoSuchMethodException while calling newInstance via reflection

spark-shell throws NoSuchMethodException if I define a class in REPL and then call newInstance via reflection. But the same code works fine in nati ...

spark-shell not working on installing apache spark. Error: system cannot find the path specified

I installed Apache Spark, have java and python installed as well. Set up the environment variables as per this article: https://phoenixnap.com/kb/inst ...

Is there a way to rerun a pasted block of code in spark shell?

I regularly copy blocks of code into spark-shell and run the block using :paste ctrl-d Sometimes it errors because another line of code is required ...

Spark-Shell Scala Dataset Display only a few columns in query

I am trying to display just a few columns in Scala like just name, address and zip I have this so far... But can't just get to display only 3 colu ...

Load Text Files and store it in Dataframe using Pyspark

I am migrating pig script to pyspark and I am new to Pyspark so I am stuck at data loading. My pig script looks like: Bag1 = LOAD '/refined/em/em_re ...

Dynamically Pivot/Transpose Rows to Columns in Hive/Spark

I have Quaterly basis Data and Data keeps Growing dynamically as Quater Grows- When the Number of Quarters are less i am manually editing the query ...

Same Spark Dataframe created in 2 different ways gets different execution times in same query

I created the same Spark Dataframe in 2 ways in order to run Spark SQL on it. 1. I read the data from a .csv file straight into a Dataframe in Spark ...

Spark SQL and MongoDB query execution times on the same data don't produce expected results

This is a general question but I am hoping someone can answer it. I am comparing query execution times between MongoDB and Spark SQL. Specifically I h ...

Spark Shell: SQL Query doesn't return any results when data is integer/double

I am using the MongoDB Spark Connector to import data from MongoDB and then perform some SQL queries. I will describe the whole process before getting ...

Spark-shell does not import specified jar file

I am a complete beginner to all this stuff in general so pardon if I'm missing some totally obvious step. I installed spark 3.1.2 and cassandra 3.11.1 ...

How to load data, with array type column, from CSV to spark dataframes

I have CSV file as shown: While loading the data, by default all the columns are loading as strings. So I defined a custom schema as String, Int ...

Is there a version compatibility issue between Spark/Hadoop/Scala/Java/Python?

I'm getting an error while running spark-shell command through cmd but unfortunately without any luck so far. I have Python/Java/Spark/Hadoop(winutils ...

spark-shell exception org.apache.spark.SparkException: Exception thrown in awaitResult

Facing below error while starting spark-shell with yarn master. Shell is working with spark local master. Below is spark-defaults.conf spark ver ...

Using s3a on linux machine fail for >100 columns parquet

I am using s3a to read from database into dataframe and write to .parquet(s3a://bucketname//folder). It works for <100 column dataframe but crashes ...

How to format CSV data by removing quotes and double-quotes around fields

I'm using a dataset and apparently it has "double quotes" wrapped around each row. I can't see it as it opens with Excel by default when I use my brow ...

object SparkHadoopUtil in package deploy cannot be accessed in package org.apache.spark.deploy

Why SparkHadoopUtil is not accessible here whereas is accessible in lower version of spark even though they are imported? ...

Setting up Spark-shell in Git Bash on windows

I have not faced this problem with any of other software on mysystem. Able to install and run everything in window terminal/command prompt and Git-Bas ...