[英]Select columns from a dataframe into another dataframe based on column datatype in Apache Spark Scala
I have a spark dataframe 我有一个火花数据框
inputDF: org.apache.spark.sql.DataFrame = [_id: string, Frequency: double, Monterary: double, Recency: double, CustID: string]
root
|-- _id: string (nullable = false)
|-- Frequency: double (nullable = false)
|-- Monterary: double (nullable = false)
|-- Recency: double (nullable = false)
|-- CustID: string (nullable = false)
I want to create a new dataframe by dropping string columns from this. 我想通过从中删除字符串列来创建一个新的数据框。 Specific condition is not to iterate over the column names .
具体条件是不要迭代列名。 Anyone has any idea ?
有人有什么主意吗?
If schema is flat and contains only simple types you can filter over fields but unless you have a crystal ball you cannot really avoid iteration: 如果模式是平面的并且仅包含简单类型,则可以过滤字段,但是除非您拥有水晶球,否则您不能真正避免迭代:
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.functions.col
df.select(df.schema.fields.flatMap(f => f.dataType match {
case StringType => Nil
case _ => col(f.name) :: Nil
}): _*)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.