I have a dataframe having parent_id,service_id,product_relation_id,product_name field as given below, I want to assign id field as shown in the table ...
I have a dataframe having parent_id,service_id,product_relation_id,product_name field as given below, I want to assign id field as shown in the table ...
I am using PySpark and want to use the benefit of multiple nodes to improve on performance time. For example: Suppose I have 3 columns and have 1 mi ...
i'm trying to filter dataframe in pyspark using "isin" also tried another way of filtering. unable to get the correct result. getting error of Spark ...
In GCP Dataproc (with pySpark), I am doing a task i.e. to read JSON file as per custom schema and load it in a Dataframe. I do have following sample ...
Given a pyspark dataframe given_df, I need to use it to generate a new dataframe new_df from it. I am trying to process the pyspark dataframe row by ...
I have a dataframe containing minute level values that looks like below: I'd like to save it as a partitioned parquet file so that I can optimize f ...
I am trying to read a list of csv files from Azure datalake one by one and after some checking, I want to union all into a single dataframe. In thi ...
I have two dataframes df1 and df2, the following is the content of each one. df1: df2: I need to find only the id that is in df1.line_item_usag ...
I have an id column for each person (data with the same id belongs to one person). I want these: Now the id column is not based on numbering, it's ...
I have two dataframes named df1 and df2, the content of data dataframe is as follows. df1: df2: I need the IDs of df1.line_item_usage_account_i ...
I have two DataFrames called DF1 and DF2, the content of each DataFrame is as follows: df1: df2: I need to make a join for fields df1.line_item ...
I have a dataframe where I need to convert rows of the same group to columns. basically pivot these. below is my df. I need the resultant data for ...
While loading data from Oracle and writing to PostgreSQL facing weird issue. Unable to write string with space to postgres. Facing below issue So t ...
I am trying to read selected columns while reading the csv file. Suppose csv file has 10 columns but I want to read only 5 columns. Is there any way t ...
I have a dataframe called df which contains the following: df is in a groupby by the accountname field, I need to make a filter by the clustername ...
I have an Integer column called birth_date in this format: 20141130 I want to convert that to 2014-11-30 in PySpark. This converts the date incorrec ...
I have these tables: df1 df2 +---+------------+ +---+---------+ | id| many_cols| | id|criterion| +---+------------+ +---+--- ...
I tried PyArrow as well, In my example I got the spark datframe using spark.sql statement. After which I wanted to convert to pandas dataframe. To sho ...
I try to create and populate a pyspark dataframe with date values. gives It looks correct, but the ValidFrom and ValidTo values are strings, not ...
example: I have a pyspark dataframe as: Let's say have some calculation to be done on each column on df which I do inside a for loop. After that my ...