简体   繁体   English

python / pandas - MultiIndexing - 消除使用全局变量

[英]python / pandas - MultiIndexing - eliminate the use of global variables

I am using pandas to import a dataframe from excel in order to sort, make changes and run some simple addition and division on the data.我正在使用 pandas 从 excel 导入 dataframe 以便对数据进行排序、更改和运行一些简单的加法和除法。

My code is working but it has global variables throughout.我的代码正在运行,但它始终具有全局变量。 I think this is poor practice and I want to somehow eliminate these global variables but I am confused on how I can go about doing this.我认为这是不好的做法,我想以某种方式消除这些全局变量,但我对如何 go 这样做感到困惑。

I'm not sure how I can further modify my dataframe with indexing and slicing without declaring global variables.我不确定如何在不声明全局变量的情况下通过索引和切片进一步修改 dataframe。

mydf = pd.read_excel('data.xlsx')

new_indexes = df.set_index(['apple', 'cherry', 'banana'])

new_indexes['apples and cherries'] = new_indexes['apple'] + new_indexes['cherries']

sliced = multi.loc(axis = 0)[pd.IndexSlice[:, 'fruits']]

total_fruits = sliced.loc[:, 'grapes', 'watermelon', 'orange'].sum(axis=1)

That's a snippet of my code.这是我的代码片段。 As you can see I am referring to the global variables in order to further modify my dataframe.如您所见,我指的是全局变量,以便进一步修改我的 dataframe。 I need to eliminate the global variables.我需要消除全局变量。 I am trying to create functions to help clean up my code.我正在尝试创建函数来帮助清理我的代码。

My main question is how can I refer to my data and changes without assigning global variables to my code?我的主要问题是如何在不将全局变量分配给我的代码的情况下引用我的数据和更改?

If I wanted to go about defining a class and reassigning the variables to properties would I be able to do something like this?如果我想 go 关于定义 class 并将变量重新分配给属性,我可以做这样的事情吗?

class MyDf:

    def __init__(self):
        pass

    def get_df(self):
        return pd.read_excel('data.xlsx')
    
    def set_index(self):
        self._multi_index = df.set_index(['apple', 'cherry', 'banana']) 

    def add_totals(self)
        self.set_indexes['apples and cherries'] = set_indexes['apple']+ new_indexes['cherries']

 

Thank you谢谢

There are several things you could do, dependent on the overall structure of your code and your goal.你可以做几件事,这取决于你的代码的整体结构和你的目标。 Without knowing more about your case and, for example, seeing how the snippet you provided is embedded into the rest of your code, those are only possible solutions.在不了解您的案例的更多信息的情况下,例如,查看您提供的代码段如何嵌入到代码的 rest 中,这些只是可能的解决方案。

You could define a function , make it take a dataframe as an argument, perform operations on it and then return the modified dataframe.您可以定义一个function ,使其以 dataframe 作为参数,对其执行操作,然后返回修改后的 dataframe。 The function could also simply take a filename as argument, so that the respective df is created within the function to begin with. function 也可以简单地将文件名作为参数,以便在 function 中创建相应的 df。 If you do not need to refer to intermediary variables such as new_indexes or sliced later in the code, using a function to perform the operations might be a good way to go.如果您不需要参考中间变量(例如new_indexes或稍后在代码中sliced ),则使用 function 执行操作可能是 go 的好方法。

You could also define a Class , make the variables into properties of objects of that class and write methods to perform the respective operations you want to do.您还可以定义Class ,将变量转换为该 class 对象的属性,并编写方法来执行您想要执行的相应操作。 This would have the advantage that you could still access your variables, if necessary.这样做的好处是您仍然可以在必要时访问您的变量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM