Pythonic equivalent of this function?

Question

I have a function to port from another language, could you please help me make it "pythonic"?

Here the function ported in a "non-pythonic" way (this is a bit of an artificial example - every task is associated with a project or "None", we need a list of distinct projects, distinct meaning no duplication of the .identifier property, starting from a list of tasks):

@staticmethod
def get_projects_of_tasks(task_list):

    projects = []
    project_identifiers_seen = {}

    for task in task_list:

        project = task.project

        if project is None:
            continue

        project_identifier = project.identifier

        if project_identifiers_seen.has_key(project_identifier):
            continue

        project_identifiers_seen[project_identifier] = True
        projects.append(project)

    return projects

I have specifically not even started to make it "pythonic" not to start off on the wrong foot (eg list comprehension with "if project.identifier is not None, filter() based on predicate that looks up the dictionary-based registry of identifiers, using set() to strip duplicates, etc.)

EDIT:

Based on the feedback, I have this:

@staticmethod
def get_projects_of_tasks(task_list):

    projects = []
    project_identifiers_seen = set()

    for task in task_list:

        project = task.project

        if project is None:
            continue

        project_identifier = project.identifier

        if project_identifier in project_identifiers_seen:
            continue

        project_identifiers_seen.add(project_identifier)
        projects.append(project)

    return projects

Answer 1

There's nothing massively unPythonic about this code. A couple of possible improvements:

project_identifiers_seen could be a set, rather than a dictionary.
foo.has_key(bar) is better spelled bar in foo
I'm suspicious that this is a staticmethod of a class. Usually there's no need for a class in Python unless you're actually doing data encapsulation. If this is just a normal function, make it a module-level one.

Answer 2

def get_projects_of_tasks(task_list):
    seen = set()
    return [seen.add(task.project.identifier) or task.project #add is always None
            for task in task_list if 
            task.project is not None and task.project.identifier not in seen]

This works because (a) add returns None (and or returns the value of the last expression evaluated) and (b) the mapping clause (the first clause) is only executed if the if clause is True .

There is no reason that it has to be in a list comprehension - you could just as well set it out as a loop, and indeed you may prefer to. This way has the advantage that it is clear that you are just building a list, and what is supposed to be in it.

I've not used staticmethod because there is rarely a need for it. Either have this as a module-level function, or a classmethod .

An alternative is a generator (thanks to @delnan for point this out):

def get_projects_of_tasks(task_list):
    seen = set()
    for task in task_list:
        if task.project is not None and task.project.identifier not in seen:
           identifier = task.project.identifier
           seen.add(identifier)
           yield task.project

This eliminates the need for a side-effect in comprehension (which is controversial), but keeps clear what is being collected.

For the sake of avoiding another if/continue construction, I have left in two accesses to task.project.identifier . This could be conveniently eliminated with the use of a promise library.

This version uses promises to avoid repeated access to task.project.identifier without the need to include an if/continue:

from peak.util.proxies import LazyProxy, get_cache # pip install ProxyTypes

def get_projects_of_tasks(task_list):
    seen = set()
    for task in task_list:
        identifier = LazyProxy(lambda:task.project.identifier) # a transparent promise
        if task.project is not None and identifier not in seen:
           seen.add(identifier)
           yield task.project

This is safe from AttributeErrors because task.project.identifier is never accessed before task.project is checked.

Answer 3

What about:

project_list = {task.project.identifier:task.project for task in task_list if task.project is not None}
return project_list.values()

For 2.6- use dict constructor instead:

return dict((x.project.id, x.project) for x in task_list if x.project).values()

Answer 4

Some say EAFP is pythonic, so:

@staticmethod
def get_projects_of_tasks(task_list):

    projects = {}

    for task in task_list:
        try:
            if not task.project.identifier in projects:
                projects[task.project.identifier] = task.project
        except AttributeError:
            pass

    return projects.values()

of cours an explicit check wouldn't be wrong either, and would of course be better if many tasks have not project.

And just one dict to keep track of seen identifiers and projects would be enough, if the order of the projects matters, then a OrderedDict (python2.7+) could come in handy.

Answer 5

There are already a lot of good answers, and, indeed, you have accepted one! But I thought I would add one more option. A number of people have seen that your code could be made more compact with generator expressions or list comprehensions. I'm going to suggest a hybrid style that uses generator expressions to do the initial filtering, while maintaining your for loop in the final filter.

The advantage of this style over the style of your original code is that it simplifies the flow of control by eliminating continue statements. The advantage of this style over a single list comprehension is that it avoids multiple accesses to task.project.identifier in a natural way. It also handles mutable state (the seen set) transparently, which I think is important.

def get_projects_of_tasks(task_list):
    projects = (task.project for task in task_list)
    ids_projects = ((p.identifier, p) for p in projects if p is not None)

    seen = set()
    unique_projects = []
    for id, p in ids_projects:
        if id not in seen:
            seen.add(id)
            unique_projects.append(p)
    return unique_projects

Because these are generator expressions (enclosed in parenthesis instead of brackets), they don't build temporary lists. The first generator expression creates an iterable of projects; you could think of it as performing the project = task.project line from your original code on all the projects at once. The second generator expression creates an iterable of (project_id, project) tuples. The if clause at the end filters out the None values; (p.identifier, p) is only evaluated if p passes through the filter. Together, these two generator expressions do away with your first two if blocks. The remaining code is essentially the same as yours.

Note also the excellent suggestion from Marcin/delnan that you create a generator using yield . This cuts down further on the verbosity of your code, boiling it down to its essentials:

def get_projects_of_tasks(task_list):
    projects = (task.project for task in task_list)
    ids_projects = ((p.identifier, p) for p in projects if p is not None)

    seen = set()
    for id, p in ids_projects:
        if id not in seen:
            seen.add(id)
            yield p

The only disadvantage -- in case this isn't obvious -- is that if you want to permanently store the projects, you have to pass the resulting iterable to list .

projects_of_tasks = list(get_projects_of_tasks(task_list))

Pythonic equivalent of this function?

Question

5 answers

solution1
7 ACCPTED 2012-05-30 16:16:52

solution2
3 2012-05-30 16:18:49

solution3
3 2012-05-30 16:21:24

solution4
1 2012-05-30 16:30:20

solution5
1 2012-05-30 21:31:37

Pythonic equivalent of this function?

Question

5 answers

solution1 7 ACCPTED 2012-05-30 16:16:52

solution2 3 2012-05-30 16:18:49

solution3 3 2012-05-30 16:21:24

solution4 1 2012-05-30 16:30:20

solution5 1 2012-05-30 21:31:37

solution1
7 ACCPTED 2012-05-30 16:16:52

solution2
3 2012-05-30 16:18:49

solution3
3 2012-05-30 16:21:24

solution4
1 2012-05-30 16:30:20

solution5
1 2012-05-30 21:31:37