Pyspark filter empty lines from RDD not working

Question

I'm relatively new to spark and pyspark

final_plogfiles = plogfiles.filter(lambda x: len(x)>0)

I wrote this code to filter out the empty lines from the RDD plogfiles. It did not remove the empty lines.

I also tried

plogfiles.filter(lambda x: len(x.split())>0)

But if I use plogfiles.filter(lambda x: x.split()) , trailing, and leading white spaces in all lines are getting trimmed

I only want to filter out empty lines. I would like to know where I'm going wrong.

Answer 1

Is plogfiles an RDD? following works fine for me:

lines = sc.textFile(input_file)
non_empty_lines = lines.filter(lambda x: len(x)>0 )