简体   繁体   English

替换 spark 中的一个列值 DataFrame

[英]Replace a column value in the spark DataFrame

could you please help me to replace column values in dataframes spark:你能帮我替换数据框火花中的列值吗:

data = [["1", "xxx", "company 0"],
        ["2", "xxx", "company 1"],
        ["3", "company 44", "company 2"],
        ["4", "xxx", "company 1"],
        ["5", "bobby", "company 1"]]


dataframe = spark.createDataFrame(data)

I am trying to replace "company" with "cmp".我正在尝试用“cmp”替换“公司”。 "Company" can be met in different columns. “公司”可以在不同的栏目中找到。

Because the "Company" may appear in any columns, you'd have to loop through each column and apply regex_replace onto each of them:因为“公司”可能出现在任何列中,所以您必须遍历每一列并将regex_replace应用于每一列:

from pyspark.sql import functions as F

cols = dataframe.columns

for c in cols:
    dataframe = dataframe.withColumn(c, F.regexp_replace(c, 'company', 'cmp'))

+---+------+-----+
| _1|    _2|   _3|
+---+------+-----+
|  1|   xxx|cmp 0|
|  2|   xxx|cmp 1|
|  3|cmp 44|cmp 2|
|  4|   xxx|cmp 1|
|  5| bobby|cmp 1|
+---+------+-----+

functional programming approach函数式编程方法

from functools import reduce
from pyspark.sql import functions as F
cols = dataframe.columns
reduce(lambda dataframe, c: dataframe.withColumn(c, F.regexp_replace(c, 'company', 'cmp')), cols, dataframe).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM