Spark聚合-嵌套组

Question

在嵌套分组方面需要帮助。 火花和scala非常新。 感谢您的专家意见。

我正在为使用spark的mongo集合进行转换。 我正在使用IntelliJ-Idea。 以下是集合的详细信息：

{
_id:
customer:
product:
location:
date:
transType:
}

用例：对于每个“产品”和每个位置，其交易类型为“已订购”的客户。

//输出类似这样的内容

        {
        Product: ABCD
          location: North america
            customer: Cust 1, type: ordered
              total: 200
       }
       {
        Product: EFGH
          location: North america
            customer: Cust 2, type: Ordered
               total: 300
}

这是我到目前为止的内容：

val conf = new SparkConf().setAppName("PVL").setMaster("local").
      set("spark.mongodb.input.uri","mongodb://127.0.0.1:27017/product.transactionEvent").
      set("spark.mongodb.output.uri", "mongodb://127.0.0.1:27017/product.transctionResult")
    val sc = new SparkContext(conf)

val rdd = sc.loadFromMongoDB()
val aggRdd = rdd.withPipeline(Seq(
      Document.parse("{$match: {transType: 'ordered'}}"),
      Document.parse("""{ $group: {_id: {prodId: "$prodId", customer: "$customer", location: "$location", Transtype: "$Transtype"}, total: {$sum:1}}}"""),
      Document.parse("""{$group: {_Id: {prodId: "$_id.prodId"}, details: {$addToSet: {customer: "$_id.customer", location: "$_id.location", transType: "$_id.transType", total: "$total"}}}}""")))

但是由于某种原因，这是行不通的。 错误是：

服务器上的“未知组运算符'prodId”

首先，有可能在火花中进行这种嵌套吗？ 如果是，“我做错了什么？ 任何帮助是极大的赞赏

Answer 1

我知道问题是什么。 在$ group语句之一中，我将_id大写（例如_Id）。 一旦我删除它，它工作正常。

简而言之：诸如unknown group operator或<field> should be inside the object类的错误<field> should be inside the object意味着代码无法识别组运算符/域。 原因可能是：

_id组中的主字段可以大写
缺少逗号
没有为组字段声明组运算符
等等..

因此，请检查您的代码是否有这些错误。

谢谢

Spark聚合-嵌套组

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-01-06 19:32:38

Spark聚合-嵌套组

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-01-06 19:32:38

解决方案1
0 已采纳 2017-01-06 19:32:38