简体   繁体   中英

Removing spaces from data in a column of dataframe in scala spark

This is the command I am using to remove "." from data in a df column in spark-scala which is working fine

rfm = rfm.select(regexp_replace(col("tagname"),"\\.","_") as "tagname",col("value"),col("sensor_timestamp")).persist()

But this is not working to remove leading spaces in the same columnar data

rfm = rfm.select(regexp_replace(col("tagname")," ","") as "tagname",col("value"),col("sensor_timestamp")).persist()

There is no error . It just fails to remove any leading spaces that i see in the data

Input : rfmshow()

+--------------------+-----+----------------+
|           tagname  |value|timestamp       |
+--------------------+-----+----------------+
|  P.A               |101.5|  1.409643313E12|
|  P.A               |100.5|  1.409643315E12|
|  P.A               |100.5|  1.409644709E12|
|P.B                 |  0.0|   1.40964471E12|

Output :

    +--------------------+-----+----------------+
    |          tagname   |value|timestamp       |
    +--------------------+-----+----------------+
    |  P_A               |101.5|  1.409643313E12|
    |  P_A               |100.5|  1.409643315E12|
    |  P_A               |100.5|  1.409644709E12|
    |P_B                 |  0.0|   1.40964471E12|

You have to provide a pattern not just the space. Provide it as below.

regexp_replace(col("tagname"),"\\s+"," ")

\\s+ is for more than one space and one more extra \\ is to escape the \\ in \\s inside method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM