CassandraRow 的 RDD 不能使用 take-command - 为什么？

Question

I am doing some exercises of the DataStax VM.我正在做一些 DataStax VM 的练习。

A CassandraTable is given and I shall do some filtering and retrieving the top 5 elements using Spark API functions rather than cassandra-query-functions.给出了一个 CassandraTable，我将使用 Spark API 函数而不是 cassandra-query-functions 进行一些过滤和检索前 5 个元素。

There I am doing the following:在那里我正在做以下事情：

val cassRdd = sc.cassandraTable("killr_video", "videos_by_year_title")
val cassRdd2 = cassRdd.filter(r=>r.getString("title") >= "T")
println("1" : + cassRdd2)
println("2" : + cassRdd2.count)
println("3" : + cassRdd2.take(5))
println("4" : + cassRdd2.take(5).count)

Results in:结果是：

1: MapPartitionsRDD[185] at filter at :19 1: MapPartitionsRDD[185] at filter at :19
2: 2250 2：2250
3: [Lcom.datastax.spark.connector.CassandraRow;@56fd2e09 3：[Lcom.datastax.spark.connector.CassandraRow;@56fd2e09
4: compile Error (missing arguments for method count in trait TraversableOnce 4：编译错误（TraversableOnce trait 中缺少方法计数的参数

What I have expected:我所期望的：

1: and 2: work as expected 1：和 2：按预期工作
3: returns only one row? 3：只返回一行？ I would expect a RDD of 5 cassandra Rows我希望 RDD 为 5 cassandra Rows
4: this isn't the rdd count after 3:, hence I didn expect it to work, looks like its some kind of cassandraRow-count-method I was not intended to call 4：这不是 3: 之后的 rdd 计数，因此我没想到它会起作用，看起来像是我不打算调用的某种 cassandraRow-count-method

The solution given by Datastax uses the RDD and does a map-transformation on it, to only take the title and on that new title-rdd it does the filtering and the take-command. Datastax 给出的解决方案使用 RDD 并对其进行映射转换，仅获取标题，并在该新标题 rdd 上进行过滤和获取命令。

Ok, works, but I don't understand, why take does not work on a RDD-of CassandraRow or what the result of that may be.好的，有效，但我不明白，为什么 take 在 CassandraRow 的 RDD 上不起作用或结果可能是什么。

val cassRdd2 = cassRdd.map(r=>r.getString("title")).filter(t >= "T")

I thought the take-command on any RDD (regardless its contents) would do always the same, taking the first x elements resulting in a new RDD of the exact same type with a size of x elements.我认为任何 RDD（无论其内容如何）上的 take-command 总是相同的，取前 x 个元素产生一个完全相同类型的新 RDD，其大小为 x 个元素。

Answer 1

rdd.take(n) actually moves n elements to the driver and returns them as an array, see ScalaDoc . rdd.take(n)实际上将n元素移动到驱动程序并将它们作为数组返回，请参阅ScalaDoc 。 If you want to print them:如果你想打印它们：

println("3" : + cassRdd2.take(5).toList)

or cassRdd2.take(5).foreach(println) .或cassRdd2.take(5).foreach(println) 。 The last line does not work as the method is called length (or size ) for arrays:最后一行不起作用，因为该方法被称为数组的length （或size ）：

println("4" : + cassRdd2.take(5).length)

Answer 2

I mixed something up:我混淆了一些东西：

take is an action, I shouldn't expect a RDD (but what is it? some binary? does it have a name? some kind of collection? may also a single Value like String or int if it fits) take是一个动作，我不应该期待 RDD（但它是什么？一些二进制文件？它有名字吗？某种集合？如果合适，也可能是一个单一的值，如 String 或 int）

On that I shouldn't use count as used to do on RDDs, rather I should use size as used to do on java-collections.在这一点上，我不应该像在 RDD 上那样使用count ，而应该像在 java-collections 上那样使用size 。 By the way, count is also an action, using an action after an action sound like dump but it was so intuitive.顺便说一句， count也是一个动作，在动作之后使用动作听起来像转储，但它是如此直观。

CassandraRow 的 RDD 不能使用 take-command - 为什么？

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-05-09 10:56:39

解决方案2
0 2019-05-09 10:57:53

CassandraRow 的 RDD 不能使用 take-command - 为什么？

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-05-09 10:56:39

解决方案2 0 2019-05-09 10:57:53

解决方案1
2 已采纳 2019-05-09 10:56:39

解决方案2
0 2019-05-09 10:57:53