[英]Pyspark: RDD with list of tokens to RDD with one token per row
我有一個帶有令牌的列表列表,例如:
mylist = [['hello'],
['cat'],
['dog'],
['hey'],
['dog'],
['I', 'need', 'coffee'],
['dance'],
['dream', 'job']]
myRDD = sc.parallelize(mylist)
我正在努力尋找將導致RDD的操作,其中每一行都是一個令牌。 我想要的輸出是:
[['hello'],
['cat'],
['dog'],
['hey'],
['dog'],
['I'],
['need'],
['coffee'],
['dance'],
['dream'],
['job']]
正確的語法是什么? 謝謝
只是flatMap
:
myRDD.flatMap(lambda xs: ([x] for x in xs))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.