[英]Using filter(lambda, list) in python to clean data
I'm web-scraping a website as a project.我正在将网站作为一个项目进行网络抓取。 I am currently clearing the data.
我目前正在清除数据。 I have a list containing some information/sentences, but some are empty and I wanted to delete them.
我有一个包含一些信息/句子的列表,但有些是空的,我想删除它们。
My thought was to create a lambda function that identifies null and non-null values to return False or True.我的想法是创建一个 lambda function 来标识 null 和非空值以返回 False 或 True。 Then I would put this function inside the filter() method and pass it to my list.
然后我会将这个 function 放在 filter() 方法中并将其传递给我的列表。 So filter() would apply the function and delete the empty strings from the list.
因此 filter() 将应用 function 并从列表中删除空字符串。
check x == ""
检查
x == ""
f = lambda x: x is not None and x != ""
You don't need a lambda here.您在这里不需要 lambda。 Use this:
用这个:
lst = ['', 'abc', '', 'def', '', 1, 2, '']
list(filter(None, lst))
Output: Output:
['abc', 'def', 1, 2]
You can use the fact that:您可以使用以下事实:
bool(None)
is False
bool(None)
为False
bool("")
(empty string) is False
bool("")
(空字符串)为False
bool("something")
(non-empty string) is True
bool("something")
(非空字符串)为True
>>> info = ['', 'abc', '', 'def', '', None]
>>> f = lambda x: bool(x)
>>> list(filter(f, info))
['abc', 'def']
You can use list comprehension
instead of filter
and get better performance.您可以使用
list comprehension
而不是filter
并获得更好的性能。
res = [elem for elem in Mylist if not elem in [None, '']]
Benchmark:基准:
from timeit import timeit
import random
Mylist = [random.choice(['',None,'a']) for _ in range(100)]
def check_bool():
f = lambda x: bool(x)
return list(filter(f, Mylist))
def lambda_if_else():
f = lambda x: x is not None and x != ""
return list(filter(f, Mylist))
def list_comprehension():
return [elem for elem in Mylist if not elem in [None, '']]
for func in [check_bool, lambda_if_else, list_comprehension]:
print(func.__name__, timeit(f"{func.__name__}()", globals=globals()))
print(list_comprehension() == lambda_if_else() == check_bool())
check_bool 21.95354559900079
lambda_if_else 19.536270918999435
list_comprehension 8.683593133999238
True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.