简体   繁体   English

如何从Pyspark RDD中删除空行

[英]How to remove empty rows from an Pyspark RDD

I am having few empty rows in an RDD which I want to remove. 我想在RDD中删除几行空行。 How can I do it? 我该怎么做?

I tried the below but it is not working. 我尝试了以下但它不起作用。 I am still getting the empty rows 我仍然得到空行

json_cp_rdd = xform_rdd.map(lambda (key, value): get_cp_json_with_planid(key, value)).filter(
            lambda x: x is not None).filter(
            lambda x: x is not '')

[u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'[{ "PLAN_ID": "d2031aed-175f-4346-af31-9d05bfd4ea3a", "CostTotalInvEOPAmount": 0.0, "St [你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你) '',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,u',u',u',u',u',u',u',u'[{“PLAN_ID”:“d2031aed-175f-4346-af31-9d05bfd4ea3a”,“ CostTotalInvEOPAmount“:0.0,”St oreCount": 0, "WeekEndingData": "2017-07-08", "UnitTotalInvBOPQuantity": 0.0, "PriceStatus": 1, "UnitOnOrderQuantity": null, "CostTotalInvBOPAmount": 0.0, "RetailSalesAmount": 0.0, "UnitCostAmount": 0.0, "CostReceiptAmount": 0.0, "CostSalesAmount": 0.0, "UnitSalesQuantity": 0.0, "UnitReceiptQuantity": 0.0, "UnitTotalInvEOPQuantity": 0.0, "CostOnOrderAmount": null}]', u'', u'', u'', u'', u'', u'', u'', u'', u''] oreCount“:0,”WeekEndingData“:”2017-07-08“,”UnitTotalInvBOPQuantity“:0.0,”PriceStatus“:1,”UnitOnOrderQuantity“:null,”CostTotalInvBOPAmount“:0.0,”RetailSalesAmount“:0.0,”UnitCostAmount“ :0.0,“CostReceiptAmount”:0.0,“CostSalesAmount”:0.0,“UnitSalesQuantity”:0.0,“UnitReceiptQuantity”:0.0,“UnitTotalInvEOPQuantity”:0.0,“CostOnOrderAmount”:null}]',u',u'',你,'你',你',你',你',你',你'']

is checks object identity not equality. is检查对象身份不平等。 In Python 2.x you could use != 在Python 2.x中你可以使用!=

.filter(lambda x: x is not None).filter(lambda x: x != "")

but idiomatically you can use only a single filter with identity: 但在惯用的情况下,您只能使用带有标识的单个filter

.filter(lambda x: x)

or directly with bool : 或直接与bool

.filter(bool)

filter(lambda x: x is not '')替换filter(lambda x: x is not '') filter(lambda x: x is not u'')并且它解决了

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM