![](/img/trans.png)
[英]How to add a new column to pySpark dataframe which contains count its column values which are greater to 0?
[英]How to concatenate 2 columns,which contains np arrays, into a new column in dataframe
原始數據框
A B
np.array([1, 2, 3]) np.array([4, 5, 6])
np.array([7, 8, 9]) np.array([9, 10, 11])
希望它成為
A B C
np.array([1, 2, 3]) np.array([4, 5, 6]) np.array([1, 2, 3, 4, 5, 6])
np.array([7, 8, 9]) np.array([9, 10, 11]) np.array([7, 8, 9, 9, 10, 11])
如何實現呢?
選項1:
In [66]: df['C'] = [np.append(*x) for x in df[['A', 'B']].values]
In [67]: df
Out[67]:
A B C
0 [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [7, 8, 9] [9, 10, 11] [7, 8, 9, 9, 10, 11]
選項2: df['C'] = [np.concatenate(x) for x in df[['A', 'B']].values]
選項3: df['C'] = map(np.concatenate, df[['A', 'B']].values)
測試
In [69]: df.loc[0, 'C']
Out[69]: array([1, 2, 3, 4, 5, 6])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.