简体   繁体   中英

How convert JavaRDD<Row> to JavaRDD<List<String>>?

JavaRDD<List<String>> documents = StopWordsRemover.Execute(lemmatizedTwits).toJavaRDD().map(new Function<Row, List<String>>() {
    @Override
    public List<String> call(Row row) throws Exception {
        List<String> document = new LinkedList<String>();
        for(int i = 0; i<row.length(); i++){
            document.add(row.get(i).toString());
        }
        return  document;
    }
});

I try make it with use this code, but I get WrappedArray

[[WrappedArray(happy, holiday, beth, hope, wonderful, christmas, wish, best)], [WrappedArray(light, shin, meeeeeeeee, like, diamond)]]

How make it correctly?

You can use getList method:

Dataset<Row> lemmas = StopWordsRemover.Execute(lemmatizedTwits).select("lemmas");
JavaRDD<List<String>> documents = lemmas.toJavaRDD().map(row -> row.getList(0));

where lemmas is the name of the column with lemmatized text. If there is only one column (it looks like this is the case) you can skip select . If you know the index of the column you can skip select as well and pass index to getList but it is error prone.

Your current code iterates over the Row not the field you're trying to extract.

Here's an example with using an excel file :

JavaRDD<String> data = sc.textFile(yourPath);
        
String header = data.first();

JavaRDD<String> dataWithoutHeader = data.filter(line -> !line.equalsIgnoreCase(header) && !line.isEmpty());

JavaRDD<List<String>> dataAsList = dataWithoutHeader.map(line -> Arrays.asList(line.split(";")));

hope this peace of code help you

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM