[英]Create Rows based on Column
I want to create a row based on a column.我想基于一列创建一行。
For example - I have the following data frame.例如 - 我有以下数据框。
| lookup_name | alt_name | inventory | location |
|-------------|----------|-----------|----------|
| Honda | Car | 1 | au |
| Apple | Fruit | 1 | us |
I want to convert it to the following我想将其转换为以下
| lookup_name | inventory | location |
|-------------|-----------|----------|
| Honda | 1 | au |
| Car | 1 | au |
| Apple | 1 | us |
| Fruit | 1 | us |
Where the alternative name column is removed and the locations and inventory are copied against the new lookup_name entry.删除替代名称列并根据新的lookup_name 条目复制位置和库存的位置。
data= [
('Honda', 'Car', 1, 'au'),
('Apple', 'Fruit', 1, 'us'),
]
df = spark.createDataFrame(data, ['lookup_name','alt_name', 'inventory', 'location'])
(
df
.withColumn('lookup_name', explode(array(col('lookup_name'), col('alt_name'))))
.drop('alt_name')
.show(10, False)
)
# +-----------+---------+--------+
# |lookup_name|inventory|location|
# +-----------+---------+--------+
# |Honda |1 |au |
# |Car |1 |au |
# |Apple |1 |us |
# |Fruit |1 |us |
# +-----------+---------+--------+
array(col('lookup_name'), col('alt_name'))
=> ['Honda', 'Car'] array(col('lookup_name'), col('alt_name'))
=> ['本田','汽车']
df.withColumn('lookup_name', array(col('lookup_name'), col('alt_name'))).show(10, False)
# +--------------+--------+---------+--------+
# |lookup_name |alt_name|inventory|location|
# +--------------+--------+---------+--------+
# |[Honda, Car] |Car |1 |au |
# |[Apple, Fruit]|Fruit |1 |us |
# +--------------+--------+---------+--------+
pyspark.sql.functions.explode(col)
Returns a new row for each element in the given array or map. pyspark.sql.functions.explode(col)
为给定数组或 map 中的每个元素返回一个新行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.