Spark添加具有值形式的新列前一些列

Question

I have DataFrame like this:我有这样的 DataFrame：

+----------+---+
|   code   |idn|
+----------+---+
|   [I0478]|  0|
|   [B0527]|  1|
|   [C0798]|  2|
|   [C0059]|  3|
|   [I0767]|  4|
|   [I1001]|  5|
|   [C0446]|  6|
+----------+---+

And i want to add new column to DataFrame我想向 DataFrame 添加新列

+----------+---+------+
|   code   |idn| item |
+----------+---+------+
|   [I0478]|  0| I0478|
|   [B0527]|  1| B0527|
|   [C0798]|  2| C0798|
|   [C0059]|  3| C0059|
|   [I0767]|  4| I0767|
|   [I1001]|  5| I1001|
|   [C0446]|  6| C0446|
+----------+---+------+

Please help me do this!请帮我做这个！

Answer 1

使用[] ：

df.withColumn("item", df["item"][0])

Answer 2

So the problem will be evident if you look at the schema - the column you are trying to subset is not an array.因此，如果您查看架构，问题将很明显 - 您尝试子集化的列不是数组。 So the solution is to .* expand the column.所以解决方案是 .* 扩展列。

df.select('code.*', 'idn')

Answer 3

python蟒蛇

import pandas as pd


array = {'code': [['I0478'],['B0527'], ['C0798'], ['C0059'], ['I0767'], ['I1001'], ['C0446']], 'idn':[0, 1, 2, 3, 4, 5, 6]}


df = pd.DataFrame(array)

df['item'] = df.apply(lambda row: str(row.code).lstrip('[').rstrip(']').strip("'").strip(), axis= 1)


print(df)

Answer 4

df.withColumn("item", df["code"][0])

如果“item”列是Array类型，如果是String的Struct，可能需要通过df.select("code").collect()[0]检查item元素的key，看key是什么(string ）它有。

Spark添加具有值形式的新列前一些列

问题描述

4 个解决方案

解决方案1
1 2017-10-02 10:51:17

解决方案2
1 2020-03-03 04:18:33

解决方案3
0 2020-03-03 04:45:12

解决方案4
-1 2018-11-12 16:53:13

Spark添加具有值形式的新列前一些列

问题描述

4 个解决方案

解决方案1 1 2017-10-02 10:51:17

解决方案2 1 2020-03-03 04:18:33

解决方案3 0 2020-03-03 04:45:12

解决方案4 -1 2018-11-12 16:53:13

解决方案1
1 2017-10-02 10:51:17

解决方案2
1 2020-03-03 04:18:33

解决方案3
0 2020-03-03 04:45:12

解决方案4
-1 2018-11-12 16:53:13