简体   繁体   English

计算 Apache Spark DStream 中的元素

[英]Count Elements Inside Apache Spark DStream

I need to retrive number of element inside a DStream using Java.我需要使用 Java 检索 DStream 中的元素数量。 Reading documentation I have do something like the following:阅读文档我做了如下的事情:

JavaDStream<Object> stream;

stream.count()

It return a DStream object instead of a number它返回一个 DStream 对象而不是一个数字

How Can I get the amount of elements in DStream?如何获取 DStream 中的元素数量? I need it in a test suite我在测试套件中需要它

You cannot.你不能。 DStream represents an infinite sequence of RDDs so it is not really meaningful to ask about the total number of elements. DStream表示无限的 RDD 序列,因此询问元素总数并没有什么意义。

You can add stateful operations which will keep track of the number of values and update it by window but it is not the same as asking for count over the stream.您可以添加有状态操作来跟踪值的数量并按窗口更新它,但这与要求对流进行计数不同。 You can check MapWithStateSuite to see how testing state can be implemented.您可以检查MapWithStateSuite以了解如何实现测试状态。

val count =topNUrl.foreachRDD { rdd => 
                      rdd.count()
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Null Java Spark DStream 中的指针异常,当在 DStream 中使用变量时 Lambda 在 Spark 集群模式下关闭 - Null Pointer Exception in Java Spark DStream when used a variable inside DStream Lambda Closure in Spark Cluster Mode 将mapPartitionsWithIndex用于DStream-Spark流 - Use mapPartitionsWithIndex for DStream - Spark Streaming Apache 独特元素的光束数 - Apache Beam count of unique elements 寻找Spark DStream到Parquet文件的性能提示 - Looking for performance tips on Spark DStream to Parquet files Spark DStream的foreachDD函数中RDD的并发转换 - Concurrent transformations on RDD in foreachDD function of Spark DStream 在Java Apache Spark中对齐分区中的元素数量 - Aligning number of elements in partition in Java Apache Spark Apache Spark-简单字数获取:SparkException:任务不可序列化 - Apache Spark - Simple Word Count gets: SparkException: Task not serializable 将多个列分组并从 Apache Spark 中的 DataFrame 计数 - Java - Grouping multiple columns and count from a DataFrame in Apache Spark - Java Apache Spark:Java RDD 中特定字段的记录计数 - Apache Spark: Count of records by a specific field in Java RDD Apache Spark需要5到6分钟才能从Cassandra中简单计算1个亿行 - Apache Spark taking 5 to 6 minutes for simple count of 1 billon rows from Cassandra
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM