简体   繁体   English

如何通过 pyspark 删除字符串中的每个空格?

[英]how to remove every space inside string by pyspark?

df1 = spark.read.csv('/content/drive/MyDrive/BigData2021/Lecture23/datasets/cities.csv', header = True, inferSchema= True)
import pyspark.sql.functions as F

for name in df1.columns:
     df1 = df1.withColumn(name, F.trim(df1[name]))
     df1.show()

Here is my piece of code I try to trim every space in column header and also values but it does't work I need function to use every other df.这是我的一段代码,我尝试修剪 header 列中的每个空格以及值,但它不起作用我需要 function 才能使用其他所有 df。

You can use use regexp_replace to replace spaces in column values with empty string "" .您可以使用regexp_replace将列值中的空格替换为空字符串""

You can use replace to remove spaces in column names.您可以使用replace删除列名中的空格。

from pyspark.sql import functions as F

df = spark.createDataFrame([("col1 with spaces  ", "col 2 with spaces", ), ], ("col 1", "col 2"))

"""
+------------------+-----------------+
|             col 1|            col 2|
+------------------+-----------------+
|col1 with spaces  |col 2 with spaces|
+------------------+-----------------+
"""
select_expr = [F.regexp_replace(F.col(c), "[\s]", "").alias(c.replace(" ", "")) for c in df.columns]

df.select(*select_expr).show()

"""
+--------------+--------------+
|          col1|          col2|
+--------------+--------------+
|col1withspaces|col2withspaces|
+--------------+--------------+
"""

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM