[英]How to create an array or collection of Dataset<Row>of spark-dataframes type?
I am working in filtration of an avro file in spark using java.我正在使用 java 过滤 spark 中的 avro 文件。 I am getting different Dataframes for different type of filtration conditions like (equalto,greater than, less than) as below:对于不同类型的过滤条件,我得到了不同的数据帧,例如(等于、大于、小于),如下所示:
df1 = sourceDf.filter(sourceDf.col(fieldName).equalTo(value)),
df2 = sourceDf.filter(sourceDf.col(fieldName).gt(value)),
df3 = sourceDf.filter(sourceDf.col(fieldName).lt(value)) and so on....
Now, i want to collect all dataframes(df1,df2,df3,...)` in one collection or array not individual ones as above.现在,我想在一个集合或数组中收集所有数据帧(df1,df2,df3,...)`,而不是如上所述的单个数据帧。 please let me know how can i achieve this as i am new in java and apache-spark.请让我知道如何实现这一点,因为我是 Java 和 apache-spark 的新手。
i tried Dataset[] RecordCollection = new Dataset[3];我试过 Dataset[] RecordCollection = new Dataset[3]; but it is not allowed.但这是不允许的。
The exception is: "can't create a generic array of dataset"例外是:“无法创建数据集的通用数组”
It's not clear what you're trying to accomplish since the examples you post filters nothing if combined.由于您发布的示例在组合时不会过滤任何内容,因此尚不清楚您要完成的工作。 but still you can do:但你仍然可以这样做:
union
from the API:来自 API 的union
:
Dataset<Row> df = df1.union(df2).union(df3)
or from the start filter using or
:或从开始过滤器使用or
:
Column c1 = sourceDf.col(fieldName).equalTo(value);
Column c2 = sourceDf.col(fieldName).gt(value);
Column c3 = sourceDf.col(fieldName).lt(value);
df1 = sourceDf.filter(c1.or(c2).or(c3))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.