简体   繁体   English

存储用于调用函数Python的数据

[英]Storing a data for recalling functions Python

I have a project in which I run multiple data through a specific function that "cleans" them. 我有一个项目,我通过一个"cleans"它们的特定函数运行多个数据。

The cleaning function looks like this: Misc.py 清洁功能如下:Misc.py

def clean(my_data)
    sys.stdout.write("Cleaning genes...\n")

    synonyms = FileIO("raw_data/input_data", 3, header=False).openSynonyms()
    clean_genes = {}

    for g in data:
        if g in synonyms:
            # Found a data point which appears in the synonym list.
            #print synonyms[g]
            for synonym in synonyms[g]:
                if synonym in data:
                    del data[synonym]
                    clean_data[g] = synonym
                    sys.stdout.write("\t%s is also known as %s\n" % (g, clean_data[g]))
    return data

FileIO is a custom class I made to open files. FileIO是我打开文件的自定义类。

My question is, this function will be called many times throughout the program's life cycle. 我的问题是,在整个程序的生命周期中会多次调用此函数。 What I want to achieve is don't have to read the input_data every time since it's gonna be the same every time. 我想要实现的是不必每次都读取input_data,因为它每次都是相同的。 I know that I can just return it, and pass it as an argument in this way: 我知道我可以返回它,并以这种方式将其作为参数传递:

def clean(my_data, synonyms = None) 
    if synonyms == None:
       ...
    else
       ...

But is there another, better looking way of doing this? 但是,还有另一种更好看的方法吗?

My file structure is the following: 我的文件结构如下:

lib
    Misc.py
    FileIO.py
    __init__.py
    ...
raw_data
runme.py

From runme.py , I do this from lib import * and call all the functions I made. runme.py ,我from lib import *执行此操作并调用我所做的所有函数。

Is there a pythonic way to go around this? 是否有一种pythonic方式来解决这个问题? Like a 'memory' for the function 就像功能的“记忆”一样

Edit: this line: synonyms = FileIO("raw_data/input_data", 3, header=False).openSynonyms() returns a collections.OrderedDict() from input_data and using the 3rd column as the key of the dictionary. 编辑:此行: synonyms = FileIO("raw_data/input_data", 3, header=False).openSynonyms() collections.OrderedDict()input_data返回collections.OrderedDict()并使用第3列作为字典的键。

The dictionary for the following dataset: 以下数据集的字典:

column1    column2    key    data
  ...        ...      A      B|E|Z
  ...        ...      B      F|W
  ...        ...      C      G|P
  ...

Will look like this: 看起来像这样:

OrderedDict([('A',['B','E','Z']), ('B',['F','W']), ('C',['G','P'])])

This tells my script that A is also known as B,E,Z . 这告诉我的脚本A也被称为B,E,Z B as F,W . BF,W etc... 等等...

So these are the synonyms. 所以这些是同义词。 Since, The synonyms list will never change throughout the life of the code. 因为,同义词列表在代码的整个生命周期中永远不会改变。 I want to just read it once, and re-use it. 我想只阅读一次,然后重复使用它。

Use a class with a __call__ operator. 使用带__call__运算符的类。 You can call objects of this class and store data between calls in the object. 您可以调用此类的对象,并在对象中的调用之间存储数据。 Some data probably can best be saved by the constructor. 一些数据可能最好由构造函数保存。 What you've made this way is known as a 'functor' or 'callable object'. 你用这种方式制作的被称为“仿函数”或“可调用对象”。

Example: 例:

class Incrementer:
    def __init__ (self, increment):
        self.increment = increment

    def __call__ (self, number):
        return self.increment + number

incrementerBy1 = Incrementer (1)

incrementerBy2 = Incrementer (2)

print (incrementerBy1 (3))
print (incrementerBy2 (3))

Output: 输出:

4
5

[EDIT] [编辑]

Note that you can combine the answer of @Tagc with my answer to create exactly what you're looking for: a 'function' with built-in memory. 请注意,您可以将@Tagc的答案与我的答案结合起来,以创建您正在寻找的内容:具有内置内存的“功能”。

Name your class Clean rather than DataCleaner and the name the instance clean . 将您的类命名为Clean而不是DataCleaner ,并将实例命名为clean Name the method __call__ rather than clean . 将方法命名为__call__而不是clean

Like a 'memory' for the function 就像功能的“记忆”一样

Half-way to rediscovering object-oriented programming. 中途重新发现面向对象的编程。

Encapsulate the data cleaning logic in a class, such as DataCleaner . 将数据清理逻辑封装在类中,例如DataCleaner Make it so that instances read synonym data once when instantiated and then retain that information as part of their state. 使其成为实例在实例化时读取同义词数据,然后将该信息作为其状态的一部分保留。 Have the class expose a clean method that operates on the data: 让类公开一个对数据进行操作的clean方法:

class FileIO(object):
    def __init__(self, file_path, some_num, header):
        pass

    def openSynonyms(self):
        return []

class DataCleaner(object):
    def __init__(self, synonym_file):
        self.synonyms = FileIO(synonym_file, 3, header=False).openSynonyms()

    def clean(self, data):
        for g in data:
            if g in self.synonyms:
                # ...
                pass

if __name__ == '__main__':
    dataCleaner = DataCleaner('raw_data/input_file')
    dataCleaner.clean('some data here')
    dataCleaner.clean('some more data here')

As a possible future optimisation, you can expand on this approach to use a factory method to create instances of DataCleaner which can cache instances based on the synonym file provided (so you don't need to do expensive recomputation every time for the same file). 作为未来可能的优化,您可以扩展此方法以使用工厂方法创建DataCleaner实例,该实例可以根据提供的同义词文件缓存实例(因此您不需要每次对同一文件进行昂贵的重新计算) 。

I think the cleanest way to do this would be to decorate your " clean " (pun intended) function with another function that provides the synonyms local for the function. 我认为最简单的方法是用另一个为函数提供本地synonyms的函数来装饰你的“ clean(双关语)函数。 this is iamo cleaner and more concise than creating another custom class, yet still allows you to easily change the "input_data" file if you need to (factory function): 这是iamo更干净,比创建另一个自定义类更简洁,但仍然允许您在需要时(工厂函数)轻松更改“input_data”文件:

def defineSynonyms(datafile):
    def wrap(func):
        def wrapped(*args, **kwargs):
            kwargs['synonyms'] = FileIO(datafile, 3, header=False).openSynonyms()
            return func(*args, **kwargs)
        return wrapped
    return wrap

@defineSynonyms("raw_data/input_data")
def clean(my_data, synonyms={}):
    # do stuff with synonyms and my_data...
    pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM