简体   繁体   English

如何创建数组或数据集集合<Row>火花数据帧类型?

[英]How to create an array or collection of Dataset<Row>of spark-dataframes type?

I am working in filtration of an avro file in spark using java.我正在使用 java 过滤 spark 中的 avro 文件。 I am getting different Dataframes for different type of filtration conditions like (equalto,greater than, less than) as below:对于不同类型的过滤条件,我得到了不同的数据帧,例如(等于、大于、小于),如下所示:

df1 = sourceDf.filter(sourceDf.col(fieldName).equalTo(value)),
df2 = sourceDf.filter(sourceDf.col(fieldName).gt(value)),
df3 = sourceDf.filter(sourceDf.col(fieldName).lt(value)) and so on....

Now, i want to collect all dataframes(df1,df2,df3,...)` in one collection or array not individual ones as above.现在,我想在一个集合或数组中收集所有数据帧(df1,df2,df3,...)`,而不是如上所述的单个数据帧。 please let me know how can i achieve this as i am new in java and apache-spark.请让我知道如何实现这一点,因为我是 Java 和 apache-spark 的新手。

i tried Dataset[] RecordCollection = new Dataset[3];我试过 Dataset[] RecordCollection = new Dataset[3]; but it is not allowed.但这是不允许的。

The exception is: "can't create a generic array of dataset"例外是:“无法创建数据集的通用数组”

It's not clear what you're trying to accomplish since the examples you post filters nothing if combined.由于您发布的示例在组合时不会过滤任何内容,因此尚不清楚您要完成的工作。 but still you can do:但你仍然可以这样做:

union from the API:来自 API 的union

Dataset<Row> df = df1.union(df2).union(df3)

or from the start filter using or :或从开始过滤器使用or

Column c1 = sourceDf.col(fieldName).equalTo(value);
Column c2 = sourceDf.col(fieldName).gt(value);
Column c3 = sourceDf.col(fieldName).lt(value);
df1 = sourceDf.filter(c1.or(c2).or(c3))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM