简体   繁体   English

Pyspark通过使用另一列中的值替换Spark dataframe列中的字符串

[英]Pyspark replace strings in Spark dataframe column by using values in another column

I'd like to replace a value present in a column with by creating search string from another column 我想通过从另一列创建搜索字符串来替换列中存在的值

before id address st
1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156
id address st
1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156
之前
id address st
1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156
id address st
1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156
After id address st
1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156
id address st
1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156
id address st
1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156
id address st
1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156
id address st
1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156
I tried
id address st
1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156
我尝试过

df.withColumn("address", regexp_replace("address","PA"+st,"PA9999"))
df.withColumn("address",regexp_replace("address","PA"+df.st,"PA9999")

both seam to fail with 都失败了

TypeError: 'Column' object is not callable

could be similar to Pyspark replace strings in Spark dataframe column 可能类似于Spark Dataframe列中的Pyspark替换字符串

You might also use the spark udf. 您也可以使用spark udf。

The solution might be applied whenever you need to modify a data frame entry with a value from another column: 每当您需要使用另一列中的值修改数据框条目时,都可以应用该解决方案:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

pd_input = pd.DataFrame({'address': ['2.PA1234.la','10.PA125.la','2.PA156.ln'],
             'st':['1234','125','156']})

spark_df = sparkSession.createDataFrame(pd_input)


replace_udf = udf(lambda address, st: address.replace(st,'9999'), StringType())

spark_df.withColumn('adress_new',replace_udf(col('address'),col('st'))).show()

Output: 输出:

+-----------+----+------------+
|     adress|  st|  adress_new|
+-----------+----+------------+
|2.PA1234.la|1234| 2.PA9999.la|
|10.PA125.la| 125|10.PA9999.la|
| 2.PA156.ln| 156| 2.PA9999.ln|
+-----------+----+------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pyspark替换Spark数据帧列中的字符串 - Pyspark replace strings in Spark dataframe column Pyspark 基于另一个 dataframe 替换数组列上的值 - Pyspark replace values on array column based on another dataframe 如何根据使用 Pyspark 的条件从另一个表更新 Spark DataFrame 表的列值 - How to update Spark DataFrame Column Values of a table from another table based on a condition using Pyspark 使用pyspark数据框将一列读取为json字符串,另一列作为常规读取 - Read one column as json strings and another as regular using pyspark dataframe 用pyspark替换数据框中列的所有值 - Replace all values of a column in a dataframe with pyspark 如何使用pyspark在Spark DataFrame中解压缩列 - How to unzip a column in a Spark DataFrame using pyspark 使用 Pandas 将特定列值替换为另一个数据框列值 - Replace specific column values with another dataframe column value using Pandas 如果列在另一个 Spark Dataframe 中,Pyspark 创建新列 - Pyspark create new column based if a column isin another Spark Dataframe 如何将pyspark数据帧列中的值与pyspark中的另一个数据帧进行比较 - How to compare values in a pyspark dataframe column with another dataframe in pyspark 在 PySpark 中将 Spark DataFrame 从行移到列,并附加另一个 DataFrame - Transposing a Spark DataFrame from row to column in PySpark and appending it with another DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM