简体   繁体   English

清理python中的操作管道

[英]Clean pipeline of operations in python

I have a long pipeline which does various operations to a list of strings input_list . 我有一条很长的管道,可以对输入字符串input_list各种操作。 The pipeline maps each word to lowercase, replaces underscores, filters out a specific word, remove duplicates, and clips to a certain length. 管道将每个单词映射为小写字母,替换下划线,过滤出特定单词,删除重复项,并将片段剪辑为特定长度。

result = list(set(filter(lambda x : x != word, map(lambda x : x.lower().replace('_',' '), input_list))))[:clip_length]

My problem with this is its not very readable: its not very clear what the input to this pipeline is and in what order operations are applied. 我的问题是它的可读性很差:它不清楚该管道的输入是什么以及以什么顺序应用操作。 It hurts to look at a bit, and I probably won't know what it does later on unless its been nicely commented. 稍微看一下会很痛,除非它被很好地评论,否则以后我可能不知道它会做什么。

Is there any way to write a pipeline in python where I can clearly see which operations happen in what order, what goes in and what goes out? 有什么方法可以在python中编写管道,在其中我可以清楚地看到哪些操作以什么顺序发生,什么进来什么,什么出去? To be more specific, I'd like to be able to write it so that operations go either right-to-left or left-to-right, not inner-to-outer. 更具体地说,我希望能够编写它,以便操作从右到左或从左到右,而不是从内到外。

That's functional-style, which you can read from innermost expression towards outermost. 这是一种功能样式,您可以从最里面的表达式到最外面的表达式进行阅读。

Putting it on multiple lines with some comments can help readability: 将其放在多行中并添加一些注释可以提高可读性:

result = list(                                # (5) convert to list
  set(                                        # (4) convert to set (remove dupes)
    filter(
      lambda x: x != word,                    # (3) filter items != to word
      map(
        lambda x: x.lower().replace('_',' '), # (2) apply transformation
        input_list                            # (1) take input_list
      )
    )
  )
)[:clip_length]                               # (6) limit number of results

It's a matter of taste. 这是一个品味问题。 I tend to prefer single expressions like this, with a minimal formatting that would allow it to fit nicely: 我倾向于使用像这样的单个表达式,并采用最小的格式使其能够很好地适合:

result = list(set(filter(lambda x : x != word,
    map(lambda x : x.lower().replace('_',' '), input_list))))[:clip_length]

An equivalent imperative-style processing is: 等效的命令式处理是:

result = set()
for x in input_list:
    x = x.lower().replace('_', ' ')
    if x != word:
        result.add(x)
result = list(result)[:clip_length]

Well it's functional, but it has no (consistent) style. 它功能齐全,但是没有(一致)样式。 The "problem" is the wide variety of syntaxes used for these expressions. “问题”是用于这些表达式的多种语法。

  • calling a func is done with normal prefix notation f(arg) 调用函数是通过普通前缀符号f(arg)
  • getting a sub array uses a special syntax arr[n?:m?] , instead of a function slice(n,m) 获取子数组使用特殊语法arr[n?:m?] ,而不是函数slice(n,m)
  • set is a completely different type, but it is used intermediately to because sets happen to have some of the behavior we want - what we want is "unique" elements in an iterable, and so our function should be called unique . set是一种完全不同的类型,但是在中间使用它是因为set恰好具有我们想要的某些行为-我们想要的是可迭代的“唯一”元素,因此我们的函数应称为unique If we happen to implement unique using a set , that's fine, but that is not the concern of the reader, whose mind is free from such distractions 如果我们碰巧使用set来实现unique ,那很好,但这不是读者的关注点,他们的思想不受这种干扰
  • x.lower() is a dynamic call with lower in infix position. x.lower()是动态调用, lower infix位置lower Compare to prefix position lower(x) . 比较前缀位置lower(x) The same applies for s.replace(pat,rep) vs replace(s, pat, rep) s.replace(pat,rep) vs replace(s, pat, rep)同样适用
  • map and filter however do have a functional interface map(f,iter) and filter(f,iter) mapfilter但是具有功能接口map(f,iter)filter(f,iter)

But to write a program like the one you've shared, sort of misses out on functional style's strongest and most versatile trait: the function. 但是,要编写一个与您共享的程序一样的程序,可能会错过功能样式最强大,用途最广泛的特征:功能。 Yes, functional programming is also about composing beautiful chains of expressions, but not at the cost of readability! 是的,函数式编程还涉及组成漂亮的表达式链,但并不以可读性为代价! If readability starts to hurt, make it better with... a function :D 如果可读性开始受到损害,请使用...功能使其更好。:D

Consider this program that uses a uniform functional style. 考虑使用统一功能样式的该程序。 It's still a regular python program. 它仍然是常规的python程序。

def program (word = '', clip_length = 5, input = ''):
  make_words = \
    compose ( lower
            , partial (replace, '_', ' ')
            )

  process = \
    compose ( partial (map, make_words)
            , partial (filter, lambda x: x != word)
            , unique
            , partial (take, clip_length)
            )

  return process (input)

print (program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e'))
# ['d', ' ', 'e', 'a']
# Note, your output may vary. More on this later.

And now the dependencies. 现在是依赖项。 Each function operates solely on its arguments and returns an output. 每个函数仅对其参数进行操作,并返回输出。

def partial (f, *xs):
  return lambda *ys: f (*xs, *ys)

def compose (f = None, *fs):
  def comp (x):
    if f is None:
      return x
    else:
      return compose (*fs) (f (x))
  return comp

def take (n = 0, xs = []):
  return xs [:n]

def lower (s = ''):
  return s .lower ()

def replace (pat = '', rep = '', s = ''):
  return s .replace (pat, rep)

def unique (iter):
  return list (set (iter))

Really, this question couldn't have setup a better stage for some of these bullet points. 的确,对于这些要点中的某些问题,这个问题无法设置一个更好的阶段。 I'm going to revisit the choice of set as used in the original question (and in the program above) because there's a huge problem: if you re-run our program several times, we will get a different output. 我将重新讨论原始问题(以及上面的程序)中使用的set的选择,因为存在一个巨大的问题:如果您多次运行我们的程序,我们将获得不同的输出。 In fancier words, we have no referential transparency . 用幻想的话来说,我们没有参照透明性 That's because Python's sets are unordered, and when we convert from an ordered list, to a set, then back to a list, it's not guaranteed that we'll always get the same elements. 这是因为Python的集合是无序的,并且当我们从有序列表转换成集合然后再返回到列表时,不能保证我们总是得到相同的元素。

Using set this way shows good intuition on how to solve the uniques problem using existing language features, but we want to restore referential transparency. 通过这种方式使用set可以很好地了解如何使用现有语言功能解决唯一性问题,但是我们希望恢复参照透明性。 In our program above, we clearly encoded our intention of getting an inputs unique elements by calling the unique function on it. 在上面的程序中,我们明确地编码了通过调用输入上的unique函数来获得输入唯一元素的意图。

# deterministic implementation of unique
def unique (iter):
  result = list ()
  seen = set ()
  for x in iter:
    if x not in seen:
      seen .add (x)
      result .append (x)
  return result

Now when we run our program, we always get the same result 现在,当我们运行程序时,我们总是得到相同的结果

print (program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e'))
# ['a', ' ', 'c', 'd']
# always the same output now

This brings me to another point. 这把我引到了另一点。 Because we abstracted unique into its own function, we're automatically given a scope to define its behavior in. I chose to use imperative style in unique 's implementation, but that's fine as it is still a pure function and the consumer of the function cannot tell the difference. 因为我们将unique抽象为它自己的函数,所以我们会自动获得一个范围来定义其行为。我选择在unique的实现中使用命令式样式,但这很好,因为它仍然是纯函数并且是函数的使用者无法区别。 You can come up with 100 other implementations of unique so long as program works, it doesn't matter. 只要program可以工作,您就可以提出其他100种unique实现方式,这无关紧要。

Functional programming is about functions . 函数式编程与函数有关。 The language is yours to tame. 该语言是您的驯服语言。 It's still a regular python program. 它仍然是常规的python程序。

def fwd (x):
  return lambda k: fwd (k (x))

def program (word = '', clip_length = 5, input = ''):
  make_words = \
    compose ( lower
            , partial (replace, '_', ' ')
            )

  fwd (input)                               \
    (partial (map, make_words))             \
    (partial (filter, lambda x: x != word)) \
    (unique)                                \
    (partial (take, clip_length))           \
    (print)

program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e')
# ['a', ' ', 'c', 'd']

Touch and experiment with this program on repl.it repl.it上触摸并试验该程序

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM