[英]PySpark create new column from existing column with a list of values
我有一個像這樣的DataFrame:
from pyspark.sql import SparkSession
from pyspark import Row
spark = SparkSession.builder \
.appName('DataFrame') \
.master('local[*]') \
.getOrCreate()
df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'),
Row(a=2, b='', c=['0', '1'], d='bar'),
Row(a=3, b='', c=['0', '1'], d='foo')])
| a| b| c| d|
+---+---+------+---+
| 1| |[0, 1]|foo|
| 2| |[0, 1]|bar|
| 3| |[0, 1]|foo|
+---+---+------+---+
我想創建列"e"
與第一個元素"c"
柱和"f"
與第二個元素列"c"
柱”,看起來像這樣:
|a |b |c |d |e |f |
+---+---+------+---+---+---+
|1 | |[0, 1]|foo|0 |1 |
|2 | |[0, 1]|bar|0 |1 |
|3 | |[0, 1]|foo|0 |1 |
+---+---+------+---+---+---+
df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'),
Row(a=2, b='', c=['0', '1'], d='bar'),
Row(a=3, b='', c=['0', '1'], d='foo')])
df2 = df.withColumn('e', df['c'][0]).withColumn('f', df['c'][1])
df2.show()
+---+---+------+---+---+---+
|a |b |c |d |e |f |
+---+---+------+---+---+---+
|1 | |[0, 1]|foo|0 |1 |
|2 | |[0, 1]|bar|0 |1 |
|3 | |[0, 1]|foo|0 |1 |
+---+---+------+---+---+---+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.