[英]How can I rename a PySpark dataframe column by index? (handle duplicated column names)
[英]How can I convert a row from a dataframe in pyspark to a column but keep the column names? - pyspark or python
我有一個數組,它由幾個 arrays 組成。
Zip 列表,然后調用 dataframe 構造函數:
df = spark.createDataFrame(zip(*all_data), cols)
df.show(truncate=False)
+-----------------------------+-----------+
|name |chromossome|
+-----------------------------+-----------+
|NM_019112.4(ABCA7):c.161-2A>T|19p13.3 |
|CCL2, 767C-G |17q11.2-q12|
+-----------------------------+-----------+
或者使用zip_longest
:
from itertools import zip_longest
df = spark.createDataFrame(zip_longest(*all_data,fillvalue=''),cols)
df.show()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.