I have a pyspark dataframe which contains string json. Looks like below:
+---------------------------------------------------------------------------+
|col |
+---------------------------------------------------------------------------+
|{"fields":{"list1":[{"list2":[{"list3":[{"type":false}]}]}]}} |
+----------------------------------------------------------------------------+--
I wrote udfs to try to parse the json and then count the value that matches phone and return to a new column in df
def item_count(json,type):
count=0
for i in json.get("fields",{}).get("list1",[]):
for j in i.get("list2",[]):
for k in j.get("list3",[]):
count+=k.get("type",None)==type
return count
def item_phone_count(json):
return item_count(json,False)
df2= df\
.withColumn('item_phone_count', (F.udf(lambda j: item_phone_count(json.loads(j)), t.StringType()))('col'))
But I got the error:
AttributeError: 'NoneType' object has no attribute 'get'
Any idea what's wrong?
Check for none and skip those entries:
def item_count(json,type):
count = 0
if (json is None) or (json.get("fields",{}) is None):
return count
for i in json.get("fields",{}).get("list1",[]):
if i is None:
continue
for j in i.get("list2",[]):
if j is None:
continue
for k in j.get("list3",[]):
if k is None:
continue
count += k.get("type",None) == type
return count
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.