简体   繁体   中英

Pyspark filter empty lines from RDD not working

I'm relatively new to spark and pyspark

final_plogfiles = plogfiles.filter(lambda x: len(x)>0)

I wrote this code to filter out the empty lines from the RDD plogfiles. It did not remove the empty lines.

I also tried

plogfiles.filter(lambda x: len(x.split())>0)

But if I use plogfiles.filter(lambda x: x.split()) , trailing, and leading white spaces in all lines are getting trimmed

I only want to filter out empty lines. I would like to know where I'm going wrong.

Is plogfiles an RDD? following works fine for me:

lines = sc.textFile(input_file)
non_empty_lines = lines.filter(lambda x: len(x)>0 )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM