简体   繁体   English

对 RDD 的每个实例进行分类 | Apache 火花 Scala

[英]Classify every instance of a RDD | Apache Spark Scala

I'm starting to work with RDD's and I have some doubts.我开始使用 RDD,但我有一些疑问。 In my case, I have a RDD and I want to classify his data.就我而言,我有一个 RDD,我想对他的数据进行分类。 My RDD contains the following:我的 RDD 包含以下内容:

Array[(String, String)] = Array((data: BD=bd_users,BD_classified,contains_people, rbd: BD=bd_users,BD_classified,contains_people),
(data: BD=bd_users,BD_classified,contains_people,contains_users, user: id=8282bd, BD_USERS,bdd),
(data: BD=bd_experts,BD_exp,contains_exp,contains_adm, rbd: BD=bd_experts,BD_ea,contains_exp,contains_adm),
(data: BD=bd_test,BD_test,contains_acc,contains_tst, rbd: BD=bd_test,BD_test,contains_tst,contains_t))

As you can see the RDD contains two strings, the first one start with data and the second one starts with rbd.如您所见,RDD 包含两个字符串,第一个以 data 开头,第二个以 rbd 开头。 What I want to do is classify every instance of this RDD as you can see here:我想要做的是对该 RDD 的每个实例进行分类,如您在此处看到的:

If the instance contains bd_users & BD_classified -> users
bd_experts & BD_exp -> experts
BD_test -> tests

The output would be something like this for this RDD:对于这个 RDD, output将是这样的:

1. Users
2. Users
3. Experts
4. Test

To do this I would like to use a map that calls a function for every instance in this RDD but I don't know how can orientate this:为此,我想使用一个 map,它为这个 RDD 中的每个实例调用一个 function 但我不知道如何定位这个:

val rdd_groups = rdd_1.map(x=>x(0).toString).map(x => getGroups(x))
def getGroups(input: String): (String) = {
//here i should use for example case to classify this strings?
}

If you need something more or examples, just tell me it.如果您需要更多内容或示例,请告诉我。 Thanks in advance!提前致谢!

Well assuming you have a RDD of strings and a classifier already defined:好吧,假设您已经定义了一个字符串 RDD 和一个分类器:

  val rdd: RDD[String] =
    ???

  def classify(input: String): String =
    ???
  
  rdd.map(input => classify(input))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM