[英]How to use flatmap in Java with personalized objects
I'm trying to flattened an RDD that contains multiple lists of PageLinks (personalized object). 我正在尝试展平包含多个页面链接(个性化对象)列表的RDD。
This is what i want to do : 这就是我想要做的:
JavaRDD<List<PageLink>> lines = sc.textFile(args[0])
.filter(s -> s.startsWith("INSERT INTO")) // Only INSERT INTO lines
.map(s -> s.substring(31)) // Substract 'INSERT INTO `pagelinks` VALUES ' from the line
.map(s -> getValues(s));
JavaRDD<PageLink> pageLinks = lines.flatMap();
This is my PageLink class: 这是我的PageLink类:
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package me.dekimpe.types;
import java.io.Serializable;
/**
*
* @author Coreuh
*/
public class PageLink implements Serializable {
private int pl_id;
private String pl_title;
public int getId() {
return pl_id;
}
public String getTitle() {
return pl_title;
}
public void setId(int pl_id) {
this.pl_id = pl_id;
}
public void setTitle(String pl_title) {
this.pl_title = pl_title;
}
public String toString() {
return "Pagelink : {'pl_id': " + this.pl_id + ", 'pl_title': '" + this.pl_title + "'}";
}
}
I want to do this because I want to create a DataFrame with the PageLinks I get : 我想这样做是因为我想用得到的PageLinks创建一个DataFrame:
Dataset<Row> df = spark.createDataFrame(pageLinks, PageLink.class);
df.limit(100).show();
You need to return iterator inside the .flatMap()
您需要在.flatMap()
返回迭代器
JavaRDD<PageLink> pageLinks = lines.flatMap(list -> list.iterator());
the last function while computing lines
can be a flatMap()
instead of map()
if you need to do in a single statement. 最后一个功能,而计算lines
可以是flatMap()
而不是map()
如果你需要在一个单一的语句来完成。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.