对 RDD 的每个实例进行分类 | Apache 火花 Scala

Question

I'm starting to work with RDD's and I have some doubts.我开始使用 RDD，但我有一些疑问。 In my case, I have a RDD and I want to classify his data.就我而言，我有一个 RDD，我想对他的数据进行分类。 My RDD contains the following:我的 RDD 包含以下内容：

Array[(String, String)] = Array((data: BD=bd_users,BD_classified,contains_people, rbd: BD=bd_users,BD_classified,contains_people),
(data: BD=bd_users,BD_classified,contains_people,contains_users, user: id=8282bd, BD_USERS,bdd),
(data: BD=bd_experts,BD_exp,contains_exp,contains_adm, rbd: BD=bd_experts,BD_ea,contains_exp,contains_adm),
(data: BD=bd_test,BD_test,contains_acc,contains_tst, rbd: BD=bd_test,BD_test,contains_tst,contains_t))

As you can see the RDD contains two strings, the first one start with data and the second one starts with rbd.如您所见，RDD 包含两个字符串，第一个以 data 开头，第二个以 rbd 开头。 What I want to do is classify every instance of this RDD as you can see here:我想要做的是对该 RDD 的每个实例进行分类，如您在此处看到的：

If the instance contains bd_users & BD_classified -> users
bd_experts & BD_exp -> experts
BD_test -> tests

The output would be something like this for this RDD:对于这个 RDD， output将是这样的：

1. Users
2. Users
3. Experts
4. Test

To do this I would like to use a map that calls a function for every instance in this RDD but I don't know how can orientate this:为此，我想使用一个 map，它为这个 RDD 中的每个实例调用一个 function 但我不知道如何定位这个：

val rdd_groups = rdd_1.map(x=>x(0).toString).map(x => getGroups(x))
def getGroups(input: String): (String) = {
//here i should use for example case to classify this strings?
}

If you need something more or examples, just tell me it.如果您需要更多内容或示例，请告诉我。 Thanks in advance!提前致谢！

Answer 1

Well assuming you have a RDD of strings and a classifier already defined:好吧，假设您已经定义了一个字符串 RDD 和一个分类器：

  val rdd: RDD[String] =
    ???

  def classify(input: String): String =
    ???
  
  rdd.map(input => classify(input))

对 RDD 的每个实例进行分类 | Apache 火花 Scala

问题描述

1 个解决方案

解决方案1
0 2021-11-22 14:10:24

对 RDD 的每个实例进行分类 | Apache 火花 Scala

问题描述

1 个解决方案

解决方案1 0 2021-11-22 14:10:24

解决方案1
0 2021-11-22 14:10:24