Concat all columns with pyspark

Question

I need to write a.txt file from a dataframe. I read that to do that I have to have a dataframe with a single column:

I'm trying to do that like this:

dataframe = dataframe.select(concat(*dataframe.columns).alias("Data"))

But it doesn't work, I think that the unpack of the columns gives some problems. And I don't want pass explicitly all the column names. Someone has an idea? Thank you

This is the output after updating code thanks to @Jonathan Lam

dataframe.show(truncate = False)
print(*[col(column) for column in dataframe.columns])
dataframe = dataframe.select(concat(*[col(column) for column in dataframe.columns]).alias("Data"))
dataframe.show(truncate = False)

Finally I found the problem: when concat meets a 'null' value the whole value becomes null. So I have to find a way to change that

Answer 1

I think your approach should be correct, just test:

df = spark.createDataFrame(
    [("1", "2", "3")],
    schema=['col1', 'col2', 'col3']
)

df.show(3, False)
+----+----+----+
|col1|col2|col3|
+----+----+----+
|1   |2   |3   |
+----+----+----+

But I am using the pyspark api col :

df.select(
    func.concat(*[func.col(col) for col in df.columns]).alias('concat')
).show(10, False)
+------+
|concat|
+------+
|123   |
+------+

Concat all columns with pyspark

Question

1 answers

solution1
0 2022-09-15 09:47:23

Concat all columns with pyspark

Question

1 answers

solution1 0 2022-09-15 09:47:23

solution1
0 2022-09-15 09:47:23