簡體   English   中英

修改函數函數的最Pythonic方法是什么?

[英]What is the most Pythonic way to modify the function of a function?

我有一個函數,我用來讀取特定格式的文件。 我的功能看起來像這樣:

import csv
from collections import namedtuple

def read_file(f, name, header=True):
    with open(f, mode="r") as infile:
        reader = csv.reader(infile, delimiter="\t")
        if header is True:
            next(reader)
        gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
        for row in reader:
            row = data(*row)
            yield row

我還有另一種類型的文件,我想用這個函數讀。 但是,在使用read_file函數之前,其他文件類型需要一些輕微的解析步驟。 例如,需要從列q條帶化尾隨句點,並且需要將字符atr附加到id列。 顯然,我可以創建一個新函數,或者向現有函數添加一些可選參數,但是有一種簡單的方法可以修改這個函數,以便它可以用來讀取其他文件類型嗎? 我在想裝飾師的東西?

恕我直言,最恐怖的方式是將函數轉換為基類,將文件操作拆分為方法,並根據基類在新類中重寫這些方法。

擁有這樣一個采用文件名而不是打開文件的單片函數本身並不是非常Pythonic。 您正在嘗試在此處實現流處理器( file stream -> line stream -> CSV record stream -> [transformator ->] data stream ),因此使用生成器實際上是個好主意。 我稍微重構一下這個模塊更加模塊化:

import csv
from collections import namedtuple

def csv_rows(infile, header):
    reader = csv.reader(infile, delimiter="\t")
    if header: next(reader)
    return reader

def data_sets(infile, header):
    gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
    for row in csv_rows(infile, header):
        yield gene_data(*row)

def read_file_type1(infile, header=True):
    # for this file type, we only need to pass the caller the raw 
    # data objects
    return data_sets(infile, header)

def read_file_type2(infile, header=True):
    # for this file type, we have to pre-process the data sets 
    # before yielding them. A good way to express this is using a
    # generator expression (we could also add a filtering condition here)
    return (transform_data_set(x) for x in data_sets(infile, header))

# Usage sample:
with open("...", "r") as f:
  for obj in read_file_type1(f):
    print obj

如您所見,我們必須在函數鏈中一直傳遞header參數。 這是一個強烈暗示,面向對象的方法在這里是合適的。 我們在這里明顯面對分層類型結構(基本數據文件,類型1,類型2)的事實支持這一點。

我建議你創建一些行迭代器,如下所示:

with MyFile('f') as f:
    for entry in f:
        foo(entry)

您可以通過為您自己的文件實現具有以下特征的類來完成此操作:

在它旁邊,您可以創建一些函數open_my_file(filename)來確定文件類型並返回要使用的propriate文件對象。 這可能只是一種企業方式,但如果您處理多種文件類型,則值得實現。

面向對象的方式是這樣的:

class GeneDataReader:

    _GeneData = namedtuple('GeneData', 'id, name, q, start, end, sym')

    def __init__(self, filename, has_header=True):
        self._ignore_1st_row = has_header
        self._filename = filename        

    def __iter__():
        for row in self._tsv_by_row():
            yield self._GeneData(*self.preprocess_row(row))

    def _tsv_by_row(self):
        with open(self._filename, 'r') as f:
            reader = csv.reader(f, delimiter='\t')
            if self._ignore_1st_row: 
                next(reader)
            for row in reader:
                yield row 

    def preprocess_row(self, row):
        # does nothing.  override in derived classes
        return row

class SpecializedGeneDataReader(GeneDataReader):

    def preprocess_row(self, row):
        row[0] += 'atr'
        row[2] = row[2].rstrip('.')
        return row    

最簡單的方法是使用額外的參數修改當前正在運行的代碼。

def read_file(name, is_special=False, has_header=True):
    with open(name,'r') as infile:
        reader = csv.reader(infile, delimiter='\t')
        if has_header:
            next(reader)
        Data = namedtuple("Data", 'id, name, q, start, end, sym')
        for row in reader:
            if is_special:
                row[0] += 'atr'
                row[2] = row[2].rstrip('.')
            row = Data(*row)
            yield row

如果您正在尋找不那么嵌套但仍基於程序的東西:

def tsv_by_row(name, has_header=True):
    with open(f, 'r') as infile: # 
        reader = csv.reader(infile, delimiter='\t')
        if has_header: next(reader)
        for row in reader:
            yield row

def gene_data_from_vanilla_file(name, has_header=True):
    for row in tsv_by_row(name, has_header):
        yield gene_data(*row)

def gene_data_from_special_file(name, has_header=True):
    for row in tsv_by_row(name, has_header):
        row[0] += 'atr'
        row[2] = row[2].rstrip('.')
        yield GeneData(*row)

如何將回調函數傳遞給read_file()

本着Niklas B.的回答:

import csv, functools
from collections import namedtuple

def consumer(func):
    @functools.wraps(func)
    def start(*args, **kwargs):
        g = func(*args, **kwargs)
        g.next()
        return g
    return start

def csv_rows(infile, header, dest):
    reader = csv.reader(infile, delimter='\t')
    if header: next(reader)
    for line in reader:
        dest.send(line)

@consumer
def data_sets(dest):
    gene_data = namedtuple("Data", 'id, name, q, start, end, sym')
    while 1:
        row = (yield)
        dest.send(gene_data(*row))

def read_file_1(fn, header=True):
    results, sink = getsink()
    csv_rows(fn, header, data_sets(sink))
    return results

def getsink():
    r = []
    @consumer
    def _sink():
        while 1:
            x = (yield)
            r.append(x)
    return (r, _sink())

@consumer
def transform_data_sets(dest):
    while True:
        data = (yield)
        dest.send(data[::-1]) # or whatever

def read_file_2(fn, header=True):
    results, sink = getsink()
    csv_rows(fn, header, data_sets(transform_data_sets(sink)))
    return results

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM