简体   繁体   English

在 python 中使用 filter(lambda, list) 来清理数据

[英]Using filter(lambda, list) in python to clean data

I'm web-scraping a website as a project.我正在将网站作为一个项目进行网络抓取。 I am currently clearing the data.我目前正在清除数据。 I have a list containing some information/sentences, but some are empty and I wanted to delete them.我有一个包含一些信息/句子的列表,但有些是空的,我想删除它们。

My thought was to create a lambda function that identifies null and non-null values to return False or True.我的想法是创建一个 lambda function 来标识 null 和非空值以返回 False 或 True。 Then I would put this function inside the filter() method and pass it to my list.然后我会将这个 function 放在 filter() 方法中并将其传递给我的列表。 So filter() would apply the function and delete the empty strings from the list.因此 filter() 将应用 function 并从列表中删除空字符串。

在此处输入图像描述

check x == ""检查x == ""

f = lambda x: x is not None and x != ""

You don't need a lambda here.您在这里不需要 lambda。 Use this:用这个:

lst = ['', 'abc', '', 'def', '', 1, 2, '']

list(filter(None, lst))

Output: Output:

['abc', 'def', 1, 2]

You can use the fact that:您可以使用以下事实:

  • bool(None) is False bool(None)False
  • bool("") (empty string) is False bool("") (空字符串)为False
  • bool("something") (non-empty string) is True bool("something") (非空字符串)为True
>>> info = ['', 'abc', '', 'def', '', None]
>>> f = lambda x: bool(x)
>>> list(filter(f, info))
['abc', 'def']

You can use list comprehension instead of filter and get better performance.您可以使用list comprehension而不是filter并获得更好的性能。

res = [elem for elem in Mylist if not elem in [None, '']]

Benchmark:基准:

from timeit import timeit
import random

Mylist = [random.choice(['',None,'a']) for _ in range(100)]

def check_bool():
    f = lambda x: bool(x)
    return list(filter(f, Mylist))

def lambda_if_else():
    f = lambda x: x is not None and x != ""
    return list(filter(f, Mylist))

def list_comprehension():
    return [elem for elem in Mylist if not elem in [None, '']]


for func in [check_bool, lambda_if_else, list_comprehension]:
    print(func.__name__, timeit(f"{func.__name__}()", globals=globals()))
    
print(list_comprehension() == lambda_if_else() == check_bool())

check_bool 21.95354559900079
lambda_if_else 19.536270918999435
list_comprehension 8.683593133999238
True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM