简体   繁体   中英

Clean pipeline of operations in python

I have a long pipeline which does various operations to a list of strings input_list . The pipeline maps each word to lowercase, replaces underscores, filters out a specific word, remove duplicates, and clips to a certain length.

result = list(set(filter(lambda x : x != word, map(lambda x : x.lower().replace('_',' '), input_list))))[:clip_length]

My problem with this is its not very readable: its not very clear what the input to this pipeline is and in what order operations are applied. It hurts to look at a bit, and I probably won't know what it does later on unless its been nicely commented.

Is there any way to write a pipeline in python where I can clearly see which operations happen in what order, what goes in and what goes out? To be more specific, I'd like to be able to write it so that operations go either right-to-left or left-to-right, not inner-to-outer.

That's functional-style, which you can read from innermost expression towards outermost.

Putting it on multiple lines with some comments can help readability:

result = list(                                # (5) convert to list
  set(                                        # (4) convert to set (remove dupes)
    filter(
      lambda x: x != word,                    # (3) filter items != to word
      map(
        lambda x: x.lower().replace('_',' '), # (2) apply transformation
        input_list                            # (1) take input_list
      )
    )
  )
)[:clip_length]                               # (6) limit number of results

It's a matter of taste. I tend to prefer single expressions like this, with a minimal formatting that would allow it to fit nicely:

result = list(set(filter(lambda x : x != word,
    map(lambda x : x.lower().replace('_',' '), input_list))))[:clip_length]

An equivalent imperative-style processing is:

result = set()
for x in input_list:
    x = x.lower().replace('_', ' ')
    if x != word:
        result.add(x)
result = list(result)[:clip_length]

Well it's functional, but it has no (consistent) style. The "problem" is the wide variety of syntaxes used for these expressions.

  • calling a func is done with normal prefix notation f(arg)
  • getting a sub array uses a special syntax arr[n?:m?] , instead of a function slice(n,m)
  • set is a completely different type, but it is used intermediately to because sets happen to have some of the behavior we want - what we want is "unique" elements in an iterable, and so our function should be called unique . If we happen to implement unique using a set , that's fine, but that is not the concern of the reader, whose mind is free from such distractions
  • x.lower() is a dynamic call with lower in infix position. Compare to prefix position lower(x) . The same applies for s.replace(pat,rep) vs replace(s, pat, rep)
  • map and filter however do have a functional interface map(f,iter) and filter(f,iter)

But to write a program like the one you've shared, sort of misses out on functional style's strongest and most versatile trait: the function. Yes, functional programming is also about composing beautiful chains of expressions, but not at the cost of readability! If readability starts to hurt, make it better with... a function :D

Consider this program that uses a uniform functional style. It's still a regular python program.

def program (word = '', clip_length = 5, input = ''):
  make_words = \
    compose ( lower
            , partial (replace, '_', ' ')
            )

  process = \
    compose ( partial (map, make_words)
            , partial (filter, lambda x: x != word)
            , unique
            , partial (take, clip_length)
            )

  return process (input)

print (program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e'))
# ['d', ' ', 'e', 'a']
# Note, your output may vary. More on this later.

And now the dependencies. Each function operates solely on its arguments and returns an output.

def partial (f, *xs):
  return lambda *ys: f (*xs, *ys)

def compose (f = None, *fs):
  def comp (x):
    if f is None:
      return x
    else:
      return compose (*fs) (f (x))
  return comp

def take (n = 0, xs = []):
  return xs [:n]

def lower (s = ''):
  return s .lower ()

def replace (pat = '', rep = '', s = ''):
  return s .replace (pat, rep)

def unique (iter):
  return list (set (iter))

Really, this question couldn't have setup a better stage for some of these bullet points. I'm going to revisit the choice of set as used in the original question (and in the program above) because there's a huge problem: if you re-run our program several times, we will get a different output. In fancier words, we have no referential transparency . That's because Python's sets are unordered, and when we convert from an ordered list, to a set, then back to a list, it's not guaranteed that we'll always get the same elements.

Using set this way shows good intuition on how to solve the uniques problem using existing language features, but we want to restore referential transparency. In our program above, we clearly encoded our intention of getting an inputs unique elements by calling the unique function on it.

# deterministic implementation of unique
def unique (iter):
  result = list ()
  seen = set ()
  for x in iter:
    if x not in seen:
      seen .add (x)
      result .append (x)
  return result

Now when we run our program, we always get the same result

print (program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e'))
# ['a', ' ', 'c', 'd']
# always the same output now

This brings me to another point. Because we abstracted unique into its own function, we're automatically given a scope to define its behavior in. I chose to use imperative style in unique 's implementation, but that's fine as it is still a pure function and the consumer of the function cannot tell the difference. You can come up with 100 other implementations of unique so long as program works, it doesn't matter.

Functional programming is about functions . The language is yours to tame. It's still a regular python program.

def fwd (x):
  return lambda k: fwd (k (x))

def program (word = '', clip_length = 5, input = ''):
  make_words = \
    compose ( lower
            , partial (replace, '_', ' ')
            )

  fwd (input)                               \
    (partial (map, make_words))             \
    (partial (filter, lambda x: x != word)) \
    (unique)                                \
    (partial (take, clip_length))           \
    (print)

program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e')
# ['a', ' ', 'c', 'd']

Touch and experiment with this program on repl.it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM