I'm relatively new to spark and pyspark
final_plogfiles = plogfiles.filter(lambda x: len(x)>0)
I wrote this code to filter out the empty lines from the RDD plogfiles. It did not remove the empty lines.
I also tried
plogfiles.filter(lambda x: len(x.split())>0)
But if I use plogfiles.filter(lambda x: x.split())
, trailing, and leading white spaces in all lines are getting trimmed
I only want to filter out empty lines. I would like to know where I'm going wrong.
Is plogfiles an RDD? following works fine for me:
lines = sc.textFile(input_file)
non_empty_lines = lines.filter(lambda x: len(x)>0 )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.