[英]Replace a column value in the spark DataFrame
could you please help me to replace column values in dataframes spark:你能帮我替换数据框火花中的列值吗:
data = [["1", "xxx", "company 0"],
["2", "xxx", "company 1"],
["3", "company 44", "company 2"],
["4", "xxx", "company 1"],
["5", "bobby", "company 1"]]
dataframe = spark.createDataFrame(data)
I am trying to replace "company" with "cmp".我正在尝试用“cmp”替换“公司”。 "Company" can be met in different columns.
“公司”可以在不同的栏目中找到。
Because the "Company" may appear in any columns, you'd have to loop through each column and apply regex_replace
onto each of them:因为“公司”可能出现在任何列中,所以您必须遍历每一列并将
regex_replace
应用于每一列:
from pyspark.sql import functions as F
cols = dataframe.columns
for c in cols:
dataframe = dataframe.withColumn(c, F.regexp_replace(c, 'company', 'cmp'))
+---+------+-----+
| _1| _2| _3|
+---+------+-----+
| 1| xxx|cmp 0|
| 2| xxx|cmp 1|
| 3|cmp 44|cmp 2|
| 4| xxx|cmp 1|
| 5| bobby|cmp 1|
+---+------+-----+
functional programming approach函数式编程方法
from functools import reduce
from pyspark.sql import functions as F
cols = dataframe.columns
reduce(lambda dataframe, c: dataframe.withColumn(c, F.regexp_replace(c, 'company', 'cmp')), cols, dataframe).show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.