I've looked into my job and have identified that I do indeed have a skewed task. How do I determine what the actual value is inside this task that is ...
I've looked into my job and have identified that I do indeed have a skewed task. How do I determine what the actual value is inside this task that is ...
I get the following error when my spark jobs fails **"org.apache.spark.shuffle.FetchFailedException: The relative remote executor(Id: 21), which maint ...
I have serious difficulties in understanding why I cannot run a transform which, after waiting so many minutes (sometimes hours), returns the error "S ...
I see in my repository it's warning me about using union and instead I should use unionByName. Aren't these the same thing? Why would I care which one ...
I'm working on exporting data from Foundry datasets in parquet format using various Magritte export tasks to an ABFS system (but the same issue occurs ...
I have some PySpark code I'm writing where I want to execute joins and other operations, but I want to log when this phase is successfully completed. ...
I'm noticing my code repo is warning me that using withColumn in a for/while loop is an antipattern. Why is this not recommended? Isn't this a normal ...
I've read the docs in Foundry for what the differences are between the two, but I'm wondering in what circumstances I would want to apply the STATIC_A ...
I have a dataset I want to repartition evenly into 10 buckets per unique value of a column, and I want to size this result into a large number of part ...
I want to run df.count() on my DataFrame, but I know my total dataset size is pretty large. Does this run the risk of materializing the data back to t ...
My Foundry transform is producing different amount of data on different runs, but I want to have similar amount of rows in each file. I can use DataFr ...
I have a set of .xml documents that I want to parse. I previously have tried to parse them using methods that take the file contents and dump them in ...
I want to parse a series of .csv files using spark.read.csv, but I want to include the row number of each line inside the file. I know that Spark typ ...
I'd like to test different inputs to a PySpark regex to see if they fail/succeed before running a build. Is there a way to test this in Foundry before ...
I want to Hive-partition my dataset, but I don't quite know how to ensure the file counts in the splits are sane. I know I should roughly aim for file ...
I want to take an arbitrary set of schemas and combine them into a single dataset that can be unpivoted later. What is the most stable way to do this? ...
I have a data connection source that creates two datasets: Dataset X (Snapshot) Dataset Y (Incremental) The two datasets pull from the same s ...
I have a pipeline setup in my Foundry instance that is using incremental computation but for some reason isn't doing what I expect. Namely, I want to ...
I have a large gziped csv file (.csv.gz) uploaded to a dataset that's about 14GB in size and 40GB when uncompressed. Is there a way to decompress, rea ...
I notice when I run the same code as my example over here but with a union or unionByName or unionAll instead of the join, my query planning takes sig ...