简体   繁体   English

修改函数函数的最Pythonic方法是什么?

[英]What is the most Pythonic way to modify the function of a function?

I have a function I am using to read in files of a particular format. 我有一个函数,我用来读取特定格式的文件。 My function looks likes this: 我的功能看起来像这样:

import csv
from collections import namedtuple

def read_file(f, name, header=True):
    with open(f, mode="r") as infile:
        reader = csv.reader(infile, delimiter="\t")
        if header is True:
            next(reader)
        gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
        for row in reader:
            row = data(*row)
            yield row

I also have another type of file that I would like to read in with this function. 我还有另一种类型的文件,我想用这个函数读。 However, the other file type needs a few slight parsing steps before I can use the read_file function. 但是,在使用read_file函数之前,其他文件类型需要一些轻微的解析步骤。 For example, trailing periods need to be striped from column q and the characters atr need to be appended to the id column. 例如,需要从列q条带化尾随句点,并且需要将字符atr附加到id列。 Obviously, I could create a new function, or add some optional arguments to the existing function, but is there a simple way to modify this function so that it can be used to read in an additional file type(s)? 显然,我可以创建一个新函数,或者向现有函数添加一些可选参数,但是有一种简单的方法可以修改这个函数,以便它可以用来读取其他文件类型吗? I was thinking of something along the lines of a decorator? 我在想装饰师的东西?

恕我直言,最恐怖的方式是将函数转换为基类,将文件操作拆分为方法,并根据基类在新类中重写这些方法。

Having such a monolithic function that takes a filename instead of an open file is by itself not very Pythonic. 拥有这样一个采用文件名而不是打开文件的单片函数本身并不是非常Pythonic。 You are trying to implement a stream processor here ( file stream -> line stream -> CSV record stream -> [transformator ->] data stream ), so using a generator is actually a good idea. 您正在尝试在此处实现流处理器( file stream -> line stream -> CSV record stream -> [transformator ->] data stream ),因此使用生成器实际上是个好主意。 I'd slightly refactor this to be a bit more modular: 我稍微重构一下这个模块更加模块化:

import csv
from collections import namedtuple

def csv_rows(infile, header):
    reader = csv.reader(infile, delimiter="\t")
    if header: next(reader)
    return reader

def data_sets(infile, header):
    gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
    for row in csv_rows(infile, header):
        yield gene_data(*row)

def read_file_type1(infile, header=True):
    # for this file type, we only need to pass the caller the raw 
    # data objects
    return data_sets(infile, header)

def read_file_type2(infile, header=True):
    # for this file type, we have to pre-process the data sets 
    # before yielding them. A good way to express this is using a
    # generator expression (we could also add a filtering condition here)
    return (transform_data_set(x) for x in data_sets(infile, header))

# Usage sample:
with open("...", "r") as f:
  for obj in read_file_type1(f):
    print obj

As you can see, we have to pass the header argument all the way through the function chain. 如您所见,我们必须在函数链中一直传递header参数。 This is a strong hint that an object-oriented approach would be appropriate here. 这是一个强烈暗示,面向对象的方法在这里是合适的。 The fact that we obviously face a hierarchical type structure here (basic data file, type1, type2) supports this. 我们在这里明显面对分层类型结构(基本数据文件,类型1,类型2)的事实支持这一点。

I suggest you to create some row iterator like following: 我建议你创建一些行迭代器,如下所示:

with MyFile('f') as f:
    for entry in f:
        foo(entry)

You can do this by implementing a class for your own files with the following traits: 您可以通过为您自己的文件实现具有以下特征的类来完成此操作:

Next to it you may create some function open_my_file(filename) that determines the file type and returns propriate file object to work with. 在它旁边,您可以创建一些函数open_my_file(filename)来确定文件类型并返回要使用的propriate文件对象。 This might be slightly enterprise way, but it worth to implement if you're dealing with multiple file types. 这可能只是一种企业方式,但如果您处理多种文件类型,则值得实现。

The object-oriented way would be this: 面向对象的方式是这样的:

class GeneDataReader:

    _GeneData = namedtuple('GeneData', 'id, name, q, start, end, sym')

    def __init__(self, filename, has_header=True):
        self._ignore_1st_row = has_header
        self._filename = filename        

    def __iter__():
        for row in self._tsv_by_row():
            yield self._GeneData(*self.preprocess_row(row))

    def _tsv_by_row(self):
        with open(self._filename, 'r') as f:
            reader = csv.reader(f, delimiter='\t')
            if self._ignore_1st_row: 
                next(reader)
            for row in reader:
                yield row 

    def preprocess_row(self, row):
        # does nothing.  override in derived classes
        return row

class SpecializedGeneDataReader(GeneDataReader):

    def preprocess_row(self, row):
        row[0] += 'atr'
        row[2] = row[2].rstrip('.')
        return row    

The simplest way would be to modify your currently working code with an extra argument. 最简单的方法是使用额外的参数修改当前正在运行的代码。

def read_file(name, is_special=False, has_header=True):
    with open(name,'r') as infile:
        reader = csv.reader(infile, delimiter='\t')
        if has_header:
            next(reader)
        Data = namedtuple("Data", 'id, name, q, start, end, sym')
        for row in reader:
            if is_special:
                row[0] += 'atr'
                row[2] = row[2].rstrip('.')
            row = Data(*row)
            yield row

If you are looking for something less nested but still procedure based: 如果您正在寻找不那么嵌套但仍基于程序的东西:

def tsv_by_row(name, has_header=True):
    with open(f, 'r') as infile: # 
        reader = csv.reader(infile, delimiter='\t')
        if has_header: next(reader)
        for row in reader:
            yield row

def gene_data_from_vanilla_file(name, has_header=True):
    for row in tsv_by_row(name, has_header):
        yield gene_data(*row)

def gene_data_from_special_file(name, has_header=True):
    for row in tsv_by_row(name, has_header):
        row[0] += 'atr'
        row[2] = row[2].rstrip('.')
        yield GeneData(*row)

如何将回调函数传递给read_file()

In the spirit of Niklas B.'s answer: 本着Niklas B.的回答:

import csv, functools
from collections import namedtuple

def consumer(func):
    @functools.wraps(func)
    def start(*args, **kwargs):
        g = func(*args, **kwargs)
        g.next()
        return g
    return start

def csv_rows(infile, header, dest):
    reader = csv.reader(infile, delimter='\t')
    if header: next(reader)
    for line in reader:
        dest.send(line)

@consumer
def data_sets(dest):
    gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
    while 1:
        row = (yield)
        dest.send(gene_data(*row))

def read_file_1(fn, header=True):
    results, sink = getsink()
    csv_rows(fn, header, data_sets(sink))
    return results

def getsink():
    r = []
    @consumer
    def _sink():
        while 1:
            x = (yield)
            r.append(x)
    return (r, _sink())

@consumer
def transform_data_sets(dest):
    while True:
        data = (yield)
        dest.send(data[::-1]) # or whatever

def read_file_2(fn, header=True):
    results, sink = getsink()
    csv_rows(fn, header, data_sets(transform_data_sets(sink)))
    return results

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM