JavaRDD<List<String>> documents = StopWordsRemover.Execute(lemmatizedTwits).toJavaRDD().map(new Function<Row, List<String>>() {
@Override
public List<String> call(Row row) throws Exception {
List<String> document = new LinkedList<String>();
for(int i = 0; i<row.length(); i++){
document.add(row.get(i).toString());
}
return document;
}
});
I try make it with use this code, but I get WrappedArray
[[WrappedArray(happy, holiday, beth, hope, wonderful, christmas, wish, best)], [WrappedArray(light, shin, meeeeeeeee, like, diamond)]]
How make it correctly?
You can use getList
method:
Dataset<Row> lemmas = StopWordsRemover.Execute(lemmatizedTwits).select("lemmas");
JavaRDD<List<String>> documents = lemmas.toJavaRDD().map(row -> row.getList(0));
where lemmas
is the name of the column with lemmatized text. If there is only one column (it looks like this is the case) you can skip select
. If you know the index of the column you can skip select
as well and pass index to getList
but it is error prone.
Your current code iterates over the Row
not the field you're trying to extract.
Here's an example with using an excel file :
JavaRDD<String> data = sc.textFile(yourPath);
String header = data.first();
JavaRDD<String> dataWithoutHeader = data.filter(line -> !line.equalsIgnoreCase(header) && !line.isEmpty());
JavaRDD<List<String>> dataAsList = dataWithoutHeader.map(line -> Arrays.asList(line.split(";")));
hope this peace of code help you
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.