I am trying to add a new row to dataframe but cant.
my code:
newRow = Row(id='ID123')
newDF= df.insertInto(newRow)
or
newDF= df.union(newRow)
errors:
AttributeError: _jdf
AttributeError: 'DataFrame' object has no attribute 'insertInto'
Simple way to add row in dataframe using pyspark
newRow = spark.createDataFrame([(15,'Alk','Dhl')])
df = df.union(newRow)
df.show()
Try: ( Documentation )
from pyspark.sql import Row
newDf = sc.parallelize([Row(id='ID123')]).toDF()
newDF.show()
Operation like is completely useless in practice. Spark DataFrame
is a data structure designed for bulk analytical jobs. It is not intended for fine grained updates.
Although you can create single row DataFrame
( as shown by innm ) and union
it won't scale and won't truly distribute the data - Spark will have to keep local copy of the data, and execution plan will grow linearly with the number of inserted objects.
Please consider using proper database instead.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.