I try to configure Apache Spark PySpark in Visual Studio Code. I install the "Spark & Hive Tools" extension pack on VScode and add Python > ...
I try to configure Apache Spark PySpark in Visual Studio Code. I install the "Spark & Hive Tools" extension pack on VScode and add Python > ...
Issue: I run a spark job that uses up all the cores on all the nodes and yet in the Dataproc CPU monitoring graph the CPU usage touches a max of 12% ...
I have a requirement to read a hive db table and write that information in text format in EBCDIC encoding as that will be used as an input to a mainfr ...
The following split/index will retrieve the following the output 'accountv2' from The split/index code is as follows: Can somene help modify the ...
I have data that is grouped by as the following: before I would like to expand the dataframe to be ungrouped into a table that looks like the image ...
I have a large parquet file where the data in one of the columns is sorted. A very simplified example is below. I am interested in querying the las ...
I have a dataframe like this, for the sake of simplicity i'm just showing 2 columns both columns are string, but in real life it will have more column ...
I am working through a huge list of package names for customers which need to be parsed to find out price information. Sample package names are as fol ...
I have a string column(DOB) like below: DOB 1973-Jun-28 1978-May-02 I want to convert this to Date type. I tried the below but it's showing null v ...
I have a Python script that runs statistical analysis and trained deep learning models on input data. The data size is fairly small (~5Mb) however the ...
I am trying to convert a column that contains Zulu formatted timestamps to a typical datetime format. This is an example of the format the dates are i ...
I want to convert a JSON string in variable to PySpark DataFrame on Databricks. I have a payload coming from API. It is a list of JSON objects hold o ...
How can i match if two RDD generated the way i did contains the same data including number of rows? I'm using scala test to run the tests and spark ve ...
Input: below query worked in sql server, due to correlated subquery same is not working in spark sql. Is there any alternate either with spark sql ...
I have a table containing Employee IDs and I'd like to add an additional column for Month containing 12 values (1 for each month). I'd like to create ...
Hi I faced this case that I need to subtract all column values between two PySpark dataframe like this: df1: df2: And I want to get the final da ...
There is a CSV with a column ID (format: 8-digits & "D" at the end). When reading csv with .option("inferSchema", "true"), it returns the data typ ...
I have one dataframe and within that dataframe there is a column that contains a string value. I need to extract a substring from that column whenever ...
Requirements: I wanted to create a dataframe out of one column (existing dataframe ). That column value is multiple json list. Problem: Since the j ...
How can I remove leading zeros after joining, for example, I want this data to be Thank you in advance!! ...