简体   繁体   中英

How to remove empty rows from an Pyspark RDD

I am having few empty rows in an RDD which I want to remove. How can I do it?

I tried the below but it is not working. I am still getting the empty rows

json_cp_rdd = xform_rdd.map(lambda (key, value): get_cp_json_with_planid(key, value)).filter(
            lambda x: x is not None).filter(
            lambda x: x is not '')

[u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'[{ "PLAN_ID": "d2031aed-175f-4346-af31-9d05bfd4ea3a", "CostTotalInvEOPAmount": 0.0, "St oreCount": 0, "WeekEndingData": "2017-07-08", "UnitTotalInvBOPQuantity": 0.0, "PriceStatus": 1, "UnitOnOrderQuantity": null, "CostTotalInvBOPAmount": 0.0, "RetailSalesAmount": 0.0, "UnitCostAmount": 0.0, "CostReceiptAmount": 0.0, "CostSalesAmount": 0.0, "UnitSalesQuantity": 0.0, "UnitReceiptQuantity": 0.0, "UnitTotalInvEOPQuantity": 0.0, "CostOnOrderAmount": null}]', u'', u'', u'', u'', u'', u'', u'', u'', u'']

is checks object identity not equality. In Python 2.x you could use !=

.filter(lambda x: x is not None).filter(lambda x: x != "")

but idiomatically you can use only a single filter with identity:

.filter(lambda x: x)

or directly with bool :

.filter(bool)

filter(lambda x: x is not '')替换filter(lambda x: x is not '') filter(lambda x: x is not u'')并且它解决了

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM