[英]Multiple Spark DataFrame mutations in a single pipe
Consider a Spark DataFrame df
with the following schema:考虑具有以下架构的 Spark DataFrame
df
:
root
|-- date: timestamp (nullable = true)
|-- customerID: string (nullable = true)
|-- orderID: string (nullable = true)
|-- productID: string (nullable = true)
One column should be cast to a different type, other columns should just have their white-space trimmed.一列应该转换为不同的类型,其他列应该只修剪空白。
df.select(
$"date",
df("customerID").cast(IntegerType),
$"orderID",
$"productId")
.withColumn("orderID", trim(col("orderID")))
.withColumn("productID", trim(col("productID")))
The operations seem to require different syntax;这些操作似乎需要不同的语法; casting is done via
select
, while trim
is done via withColumn
.铸造是通过
select
完成的,而trim
是通过withColumn
完成的。 I'm used to R
and dplyr
where all the above would be handled in a single mutate
function, so mixing select
and withColumn
feels a bit cumbersome.我习惯了
R
和dplyr
,其中所有上述内容都将在单个mutate
函数中处理,因此混合使用select
和withColumn
感觉有点麻烦。
Is there a cleaner way to do this in a single pipe?有没有更干净的方法可以在单个管道中做到这一点?
df.select(
$"date",
$"customerID".cast(IntegerType),
trim($"orderID").as("orderID"),
trim($"productID").as("productID"))
You can use either one.您可以使用任何一种。 The difference is that
withColumn
will add (or replace if the same name is used) a new column to the dataframe while select
will only keep the columns you specified.不同之处在于
withColumn
将向数据withColumn
添加(或在使用相同名称时替换)一个新列,而select
将仅保留您指定的列。 Depending on the situation, choose one to use.根据情况选择一种使用。
The cast
can be done using withColumn
as follows:该
cast
是可以做到用withColumn
如下:
df.withColumn("customerID", $"customerID".cast(IntegerType))
.withColumn("orderID", trim($"orderID"))
.withColumn("productID", trim($"productID"))
Note that you do not need to use withColumn
on the date
column above.请注意,您不需要在上面的
date
列上使用withColumn
。
The trim
functions can be done in a select
as follows, here the column names are kept the same: trim
功能可以在select
完成,如下所示,这里的列名保持不变:
df.select(
$"date",
$"customerID".cast(IntegerType),
trim($"orderID").as("orderID"),
trim($"productId").as("productId"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.