简体   繁体   中英

pyspark updating multiple columns

+----------+---------------+--------------------+--------------+-------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------+
|      date|  serial_number|               model|capacity_bytes|failure|smart_1_raw|smart_3_raw|smart_4_raw|smart_5_raw|smart_7_raw|smart_9_raw|smart_10_raw|s
+----------+---------------+--------------------+--------------+-------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------+
|2018-09-23|       ZJV01VV0|       ST12000NM0007|12000138625024|      0|   32985096|          0|  
|2018-09-23|       ZJV01VV5|       ST12000NM0007|12000138625024|      0|   77197496|          0|  
|2018-09-23| PL2331LAH3XLZJ|HGST HMS5C4040BLE640| 4000787030016|      0|          0|          0| 
|2018-09-23|       ZCH0ATJY|       ST12000NM0007|12000138625024|      0|   51954552|          0|  
|2018-09-23|       ZA1816EB|        ST8000NM0055| 8001563222016|      0|  129696704|          0| 
|2018-09-23|       ZA13ZKX8|         ST8000DM002| 8001563222016|      0|   89446512|          0| 
|2018-09-23| PL2331LAHDB5PJ|HGST HMS5C4040BLE640| 4000787030016|      0|          0|        442| 
|2018-09-23|       ZA1816E1|        ST8000NM0055| 8001563222016|      0|    8437320|          0| 
|2018-09-23| PL2331LAH3WM1J|HGST HMS5C4040BLE640| 4000787030016|      0|          0|          0| 
|2018-09-23|       S30108NT|         ST4000DM000| 4000787030016|      0|   11197576|          0| 
|2018-09-23|       ZJV01VVG|       ST12000NM0007|12000138625024|      0|  172268856|          0|  
|2018-09-23|       ZJV01VVM|       ST12000NM0007|12000138625024|      0|  101040904|          0|  
|2018-09-23|       ZA174KPY|        ST8000NM0055| 8001563222016|      0|   50287344|          0| 
|2018-09-23| PL2331LAH3W4XJ|HGST HMS5C4040BLE640| 4000787030016|      0|          0|        530| 
|2018-09-23|       Z4D068HF|         ST6000DX000| 6001175126016|      0|  232934432|          0| 

with the table all the dtypes is string. However i need to return a new dataframe where i cast only the columns with "smart" in the name as float. Also the float result should not be in exponential form. eg : 1.05757484854 instead of 1.3434E

Use DecimalType() to set appropriate precision, as desired.

from pyspark.sql.types import DecimalType
list_smart_cols = [i for i in df.columns if i[:len('smart')]=='smart']
for c in list_smart_cols:
    df = df.withColumn(c,col(c).cast(DecimalType(18,2))) # Adjust the number of decimals
                                                         # by changing the 2nd argument.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM