简体   繁体   中英

Using filter(lambda, list) in python to clean data

I'm web-scraping a website as a project. I am currently clearing the data. I have a list containing some information/sentences, but some are empty and I wanted to delete them.

My thought was to create a lambda function that identifies null and non-null values to return False or True. Then I would put this function inside the filter() method and pass it to my list. So filter() would apply the function and delete the empty strings from the list.

在此处输入图像描述

check x == ""

f = lambda x: x is not None and x != ""

You don't need a lambda here. Use this:

lst = ['', 'abc', '', 'def', '', 1, 2, '']

list(filter(None, lst))

Output:

['abc', 'def', 1, 2]

You can use the fact that:

  • bool(None) is False
  • bool("") (empty string) is False
  • bool("something") (non-empty string) is True
>>> info = ['', 'abc', '', 'def', '', None]
>>> f = lambda x: bool(x)
>>> list(filter(f, info))
['abc', 'def']

You can use list comprehension instead of filter and get better performance.

res = [elem for elem in Mylist if not elem in [None, '']]

Benchmark:

from timeit import timeit
import random

Mylist = [random.choice(['',None,'a']) for _ in range(100)]

def check_bool():
    f = lambda x: bool(x)
    return list(filter(f, Mylist))

def lambda_if_else():
    f = lambda x: x is not None and x != ""
    return list(filter(f, Mylist))

def list_comprehension():
    return [elem for elem in Mylist if not elem in [None, '']]


for func in [check_bool, lambda_if_else, list_comprehension]:
    print(func.__name__, timeit(f"{func.__name__}()", globals=globals()))
    
print(list_comprehension() == lambda_if_else() == check_bool())

check_bool 21.95354559900079
lambda_if_else 19.536270918999435
list_comprehension 8.683593133999238
True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM