[英]How can I add a row or replace in a specific index in Pyspark Dataframe?
我想將此列表 L1 添加為第一個索引中的一行,如何在 Pyspark Dataframe 中的特定索引中添加一行 append?
L1=['na',5.6,2.4]
data=[('fr',8.8,6.6),
('nr',4.4,2.5),
('cc',2.3,3.9)]
data_schema=[StructField('loc',StringType(),True),StructField('col',FloatType(),True),StructField('io',FloatType(),True)]
final=StructType(fields=data_schema)
df=spark.createDataFrame(data,schema=final)
df=df.withColumn("idx", F.row_number().over(Window.orderBy('col')))
>>show
+---+----+---+---+
|loc| col| io|idx|
+---+----+---+---+
| fr| 8.8|6.6| 1|
| nr| 4.4|2.5| 2|
| cc| 2.3|3.9| 3|
您可以使用idx != 1
過濾行,並使用union
添加一行:
from pyspark.sql import functions as F, Window
L1 = ['na',5.6,2.4]
data = [('fr',8.8,6.6),
('nr',4.4,2.5),
('cc',2.3,3.9)]
df = spark.createDataFrame(data, ['loc', 'col', 'io'])
df2 = df.withColumn(
"idx",
F.row_number().over(Window.orderBy('loc'))
).filter('idx != 1').union(spark.createDataFrame([L1 + [1]]))
df2.show()
+---+---+---+---+
|loc|col| io|idx|
+---+---+---+---+
| fr|8.8|6.6| 2|
| nr|4.4|2.5| 3|
| na|5.6|2.4| 1|
+---+---+---+---+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.