简体   繁体   English

Spark添加具有值形式的新列前一些列

[英]Spark add new column with value form previous some columns

I have DataFrame like this:我有这样的 DataFrame:

+----------+---+
|   code   |idn|
+----------+---+
|   [I0478]|  0|
|   [B0527]|  1|
|   [C0798]|  2|
|   [C0059]|  3|
|   [I0767]|  4|
|   [I1001]|  5|
|   [C0446]|  6|
+----------+---+

And i want to add new column to DataFrame我想向 DataFrame 添加新列

+----------+---+------+
|   code   |idn| item |
+----------+---+------+
|   [I0478]|  0| I0478|
|   [B0527]|  1| B0527|
|   [C0798]|  2| C0798|
|   [C0059]|  3| C0059|
|   [I0767]|  4| I0767|
|   [I1001]|  5| I1001|
|   [C0446]|  6| C0446|
+----------+---+------+

Please help me do this!请帮我做这个!

使用[]

df.withColumn("item", df["item"][0])

So the problem will be evident if you look at the schema - the column you are trying to subset is not an array.因此,如果您查看架构,问题将很明显 - 您尝试子集化的列不是数组。 So the solution is to .* expand the column.所以解决方案是 .* 扩展列。

df.select('code.*', 'idn')

python蟒蛇

import pandas as pd


array = {'code': [['I0478'],['B0527'], ['C0798'], ['C0059'], ['I0767'], ['I1001'], ['C0446']], 'idn':[0, 1, 2, 3, 4, 5, 6]}


df = pd.DataFrame(array)

df['item'] = df.apply(lambda row: str(row.code).lstrip('[').rstrip(']').strip("'").strip(), axis= 1)


print(df)
df.withColumn("item", df["code"][0])

如果“item”列是Array类型,如果是String的Struct,可能需要通过df.select("code").collect()[0]检查item元素的key,看key是什么(string )它有。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark使用上一行的值将新列添加到数据框 - Spark add new column to dataframe with value from previous row 如何将一列的值除以前一行的列(而不是同一列),并将结果作为新维度添加到numpy数组中? - How to divide one columns value by a previous rows column(not the same column) and add the result as a new dimension to a numpy array? 比较 2 列并在新列中添加一个值 - Compair 2 columns and add a value in a new column python:在groupby pandas中迭代,根据以前的值添加新列 - python: iterate in groupby pandas, add new columns based on previous value 从 dataframe 获取上一个和下一个值并添加一个新列 - getting the previous and next value from a dataframe and add a new column 向 Spark MapType 列添加新的键/值对 - Add a new key/value pair to a Spark MapType column 根据其他列创建新列,并在创建新列时根据上一行更新新列的行的值 - Create new columns based on other columns and update the value of row of new column based on the previous row while creating the new column 具有先前行值的新列 - New column with previous rows value 我想基于上一列添加一个新的 DataFrame 列,以便如果上一列元素与列表值匹配,则更改该值 - I want to add a new DataFrame column based on previous column such that if previous column element matches with list value, change the value Pandas Dataframe基于前一行,将值添加到新列,但该列的最大值限于该列 - Pandas Dataframe Add a value to a new Column based on the previous row limited to the maximum value in that column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM