簡體   English   中英

PySpark從現有列創建一個新列並包含值列表

[英]PySpark create new column from existing column with a list of values

我有一個像這樣的DataFrame:

from pyspark.sql import SparkSession
from pyspark import Row

spark = SparkSession.builder \
    .appName('DataFrame') \
    .master('local[*]') \
    .getOrCreate()

df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'),
                            Row(a=2, b='', c=['0', '1'], d='bar'),
                            Row(a=3, b='', c=['0', '1'], d='foo')])

|  a|  b|     c|  d|
+---+---+------+---+
|  1|   |[0, 1]|foo|
|  2|   |[0, 1]|bar|
|  3|   |[0, 1]|foo|
+---+---+------+---+

我想創建列"e"與第一個元素"c"柱和"f"與第二個元素列"c"柱”,看起來像這樣:

|a  |b  |c     |d  |e  |f  |
+---+---+------+---+---+---+
|1  |   |[0, 1]|foo|0  |1  |
|2  |   |[0, 1]|bar|0  |1  |
|3  |   |[0, 1]|foo|0  |1  |
+---+---+------+---+---+---+
df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'),
                            Row(a=2, b='', c=['0', '1'], d='bar'),
                            Row(a=3, b='', c=['0', '1'], d='foo')])

df2 = df.withColumn('e', df['c'][0]).withColumn('f', df['c'][1])
df2.show()

+---+---+------+---+---+---+
|a  |b  |c     |d  |e  |f  |
+---+---+------+---+---+---+
|1  |   |[0, 1]|foo|0  |1  |
|2  |   |[0, 1]|bar|0  |1  |
|3  |   |[0, 1]|foo|0  |1  |
+---+---+------+---+---+---+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM