简体   繁体   中英

Is there a design pattern that I can use for applying a list of functions to create machine learning features in python?

I am working on building an address parsing tool in python for labeling address parts. I have a pandas data frame that looks something like this.

df = pd.DataFrame({"TOKEN": ['123.', 'Fake', 'street']})

And I've got a number of feature functions that look like this:

def f_ends_in_period(s):
    return 'f_ends_in_period' if s[-1] == "." else ''

def f_numeric(s):
    return 'f_numeric' if any([k.isdigit() for k in s]) else ''

def f_capitalized(s):
    return 'f_capitalized' if s[0].isupper() else ''
...

The feature functions are fairly rigid. A feature function f_blah(s) returns "f_blah" if string s satisfies some condition (namely, condition "blah"), and otherwise returns an empty string. It's a little weird but there's a method to the madness.

Anyway, for now what I'm doing is simply going down the list

df['f_ends_in_period'] = df['TOKEN'].apply(f_ends_in_period)
df['f_numeric'] = df['TOKEN'].apply(f_numeric)
df['f_capitalized'] = df['TOKEN'].apply(f_capitalized)

And that works fine, except that every time I want to make a new feature function, I have to type the name of that feature function at least 4 times. That starts to get annoying really fast, especially if I want to create dozens of features.

Is there sort of a standard pattern that I can use to refactor this? I'm not sure exactly what the solution looks like, I'm just looking for suggestions to streamline this process.

You might be interested in this piece of code

from inspect import getmembers, isfunction
import my_module

functions_list = [o for o in getmembers(my_module) if isfunction(o[1])]

It returns a list of tuples with all the functions from the module together with their names as string.

A pythonic solution is to use a list of functions and then a comprehension to build a result list. Here's an example that could get you on the way:

def f_ends_in_period(s):
    return 'f_ends_in_period' if s[-1] == "." else ''

def f_numeric(s):
    return 'f_numeric' if any([k.isdigit() for k in s]) else ''

funcs = [f_ends_in_period, f_numeric]

result = [f(s) for f in funcs]

You'd only have to add the method or function once and then add it to the func list definition.

Edit: you could combine Elmex80s answer with mine and thus programatically build the function list, maybe even by filtering by function name. For instance, get all member functions whose name starts with "validate_", and then iterate through that list with te code above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM