簡體   English   中英

pyspark更新多列

[英]pyspark updating multiple columns

+----------+---------------+--------------------+--------------+-------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------+
|      date|  serial_number|               model|capacity_bytes|failure|smart_1_raw|smart_3_raw|smart_4_raw|smart_5_raw|smart_7_raw|smart_9_raw|smart_10_raw|s
+----------+---------------+--------------------+--------------+-------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------+
|2018-09-23|       ZJV01VV0|       ST12000NM0007|12000138625024|      0|   32985096|          0|  
|2018-09-23|       ZJV01VV5|       ST12000NM0007|12000138625024|      0|   77197496|          0|  
|2018-09-23| PL2331LAH3XLZJ|HGST HMS5C4040BLE640| 4000787030016|      0|          0|          0| 
|2018-09-23|       ZCH0ATJY|       ST12000NM0007|12000138625024|      0|   51954552|          0|  
|2018-09-23|       ZA1816EB|        ST8000NM0055| 8001563222016|      0|  129696704|          0| 
|2018-09-23|       ZA13ZKX8|         ST8000DM002| 8001563222016|      0|   89446512|          0| 
|2018-09-23| PL2331LAHDB5PJ|HGST HMS5C4040BLE640| 4000787030016|      0|          0|        442| 
|2018-09-23|       ZA1816E1|        ST8000NM0055| 8001563222016|      0|    8437320|          0| 
|2018-09-23| PL2331LAH3WM1J|HGST HMS5C4040BLE640| 4000787030016|      0|          0|          0| 
|2018-09-23|       S30108NT|         ST4000DM000| 4000787030016|      0|   11197576|          0| 
|2018-09-23|       ZJV01VVG|       ST12000NM0007|12000138625024|      0|  172268856|          0|  
|2018-09-23|       ZJV01VVM|       ST12000NM0007|12000138625024|      0|  101040904|          0|  
|2018-09-23|       ZA174KPY|        ST8000NM0055| 8001563222016|      0|   50287344|          0| 
|2018-09-23| PL2331LAH3W4XJ|HGST HMS5C4040BLE640| 4000787030016|      0|          0|        530| 
|2018-09-23|       Z4D068HF|         ST6000DX000| 6001175126016|      0|  232934432|          0| 

表中的所有數據類型都是字符串。 但是,我需要返回一個新的數據框,其中我只將名稱中帶有“smart”的列轉換為浮點數。 此外,浮點結果不應該是指數形式。 例如:1.05757484854 而不是 1.3434E

根據需要使用DecimalType()設置適當的精度。

from pyspark.sql.types import DecimalType
list_smart_cols = [i for i in df.columns if i[:len('smart')]=='smart']
for c in list_smart_cols:
    df = df.withColumn(c,col(c).cast(DecimalType(18,2))) # Adjust the number of decimals
                                                         # by changing the 2nd argument.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM