[英]Classify every instance of a RDD | Apache Spark Scala
I'm starting to work with RDD's and I have some doubts.我开始使用 RDD,但我有一些疑问。 In my case, I have a RDD and I want to classify his data.
就我而言,我有一个 RDD,我想对他的数据进行分类。 My RDD contains the following:
我的 RDD 包含以下内容:
Array[(String, String)] = Array((data: BD=bd_users,BD_classified,contains_people, rbd: BD=bd_users,BD_classified,contains_people),
(data: BD=bd_users,BD_classified,contains_people,contains_users, user: id=8282bd, BD_USERS,bdd),
(data: BD=bd_experts,BD_exp,contains_exp,contains_adm, rbd: BD=bd_experts,BD_ea,contains_exp,contains_adm),
(data: BD=bd_test,BD_test,contains_acc,contains_tst, rbd: BD=bd_test,BD_test,contains_tst,contains_t))
As you can see the RDD contains two strings, the first one start with data and the second one starts with rbd.如您所见,RDD 包含两个字符串,第一个以 data 开头,第二个以 rbd 开头。 What I want to do is classify every instance of this RDD as you can see here:
我想要做的是对该 RDD 的每个实例进行分类,如您在此处看到的:
If the instance contains bd_users & BD_classified -> users
bd_experts & BD_exp -> experts
BD_test -> tests
The output would be something like this for this RDD:对于这个 RDD, output将是这样的:
1. Users
2. Users
3. Experts
4. Test
To do this I would like to use a map that calls a function for every instance in this RDD but I don't know how can orientate this:为此,我想使用一个 map,它为这个 RDD 中的每个实例调用一个 function 但我不知道如何定位这个:
val rdd_groups = rdd_1.map(x=>x(0).toString).map(x => getGroups(x))
def getGroups(input: String): (String) = {
//here i should use for example case to classify this strings?
}
If you need something more or examples, just tell me it.如果您需要更多内容或示例,请告诉我。 Thanks in advance!
提前致谢!
Well assuming you have a RDD of strings and a classifier already defined:好吧,假设您已经定义了一个字符串 RDD 和一个分类器:
val rdd: RDD[String] =
???
def classify(input: String): String =
???
rdd.map(input => classify(input))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.