简体   繁体   中英

pyspark add new row to dataframe

I am trying to add a new row to dataframe but cant.

my code:

newRow = Row(id='ID123')
newDF= df.insertInto(newRow)
 or 
newDF= df.union(newRow)

errors:

AttributeError: _jdf

AttributeError: 'DataFrame' object has no attribute 'insertInto'

Simple way to add row in dataframe using pyspark

newRow = spark.createDataFrame([(15,'Alk','Dhl')])
df = df.union(newRow)
df.show()

Try: ( Documentation )

from pyspark.sql import Row
newDf = sc.parallelize([Row(id='ID123')]).toDF()
newDF.show()

Operation like is completely useless in practice. Spark DataFrame is a data structure designed for bulk analytical jobs. It is not intended for fine grained updates.

Although you can create single row DataFrame ( as shown by innm ) and union it won't scale and won't truly distribute the data - Spark will have to keep local copy of the data, and execution plan will grow linearly with the number of inserted objects.

Please consider using proper database instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM