[英]Spark add new column with value form previous some columns
I have DataFrame like this:我有这样的 DataFrame:
+----------+---+
| code |idn|
+----------+---+
| [I0478]| 0|
| [B0527]| 1|
| [C0798]| 2|
| [C0059]| 3|
| [I0767]| 4|
| [I1001]| 5|
| [C0446]| 6|
+----------+---+
And i want to add new column to DataFrame我想向 DataFrame 添加新列
+----------+---+------+
| code |idn| item |
+----------+---+------+
| [I0478]| 0| I0478|
| [B0527]| 1| B0527|
| [C0798]| 2| C0798|
| [C0059]| 3| C0059|
| [I0767]| 4| I0767|
| [I1001]| 5| I1001|
| [C0446]| 6| C0446|
+----------+---+------+
Please help me do this!请帮我做这个!
使用[]
:
df.withColumn("item", df["item"][0])
So the problem will be evident if you look at the schema - the column you are trying to subset is not an array.因此,如果您查看架构,问题将很明显 - 您尝试子集化的列不是数组。 So the solution is to .* expand the column.
所以解决方案是 .* 扩展列。
df.select('code.*', 'idn')
python蟒蛇
import pandas as pd
array = {'code': [['I0478'],['B0527'], ['C0798'], ['C0059'], ['I0767'], ['I1001'], ['C0446']], 'idn':[0, 1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(array)
df['item'] = df.apply(lambda row: str(row.code).lstrip('[').rstrip(']').strip("'").strip(), axis= 1)
print(df)
df.withColumn("item", df["code"][0])
如果“item”列是Array类型,如果是String的Struct,可能需要通过df.select("code").collect()[0]检查item元素的key,看key是什么(string )它有。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.