简体   繁体   中英

proper using of 'self' in a python script

I'm finally creating a class to analyse my data in a more streamlined way. It takes a CSV file and outputs some information about the table and its columns.

class Analyses:
    def Types_des_colonnes(self, df):
        tcol = df.columns.to_series().groupby(df.dtypes).groups
        tycol = {k.name: v for k, v in tcol.items()}
        return(self.tycol)

    def Analyse_table(self, table):
        # Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
        Types_des_colonnes(table)
        nbr_types_colonnes_diff=len(tycol.keys())


        type_table = table.dtypes
        liste_columns = table.columns
        clef_types= tycol.keys()
        long_table = len(table)
        nbr_cols = len(liste_columns)

        print(table.describe())

        print('Nombre de colonnes: '+ str(nbr_cols))
        print('Nombre de types de colonnes différentes: '+str(nbr_types_colonnes_diff))
        for kk in range(0,nbr_types_colonnes_diff):
            print('Type: ' + tycol.keys()[kk])
            print(tycol.values())
        return(liste_columns)

    def Analyse_colonne(self, col):
        from numpy import where, nan
        from pandas import isnull,core,DataFrame
        # Si col est un dataframe:
        if type(col) == core.frame.DataFrame:
            dict_col = {}
            for co in col.columns:
                dict_col_Loc = Analyse_colonne(col[co]);
                dict_col[co] = dict_col_Loc.values()
            return(dict_col)
        elif type(col) == core.series.Series:    
            type_col = type(col)
            arr_null = where(isnull(col))[0]
            type_data = col.dtype
            col_uniq = col.unique()

            nbr_unique= len(col_uniq)
            taille_col= len(col)
            nbr_ligne_vide= len(arr_null)

            top_entree= col.head()
            bottom_entree= col.tail()
            pct_uniq= (float(nbr_unique)/float(taille_col))*100.0
            pct_ligne_vide= (float(nbr_ligne_vide)/float(taille_col))*100.0
            print('\n')
            print('       #################      '+col.name+'      #################')
            print('Type des données: ' + str(type_data))
            print('Taille de la colonne: ' + str(taille_col))
            if nbr_unique == 1:
                print('Aucune entrée unique')
            else:
                print('Nombre d\'uniques: '+ str(nbr_unique))
                print('Pourcentage d\'uniques: '+str(pct_uniq)+' %')
            if nbr_ligne_vide == 0:
                print('Aucune ligne vide')
            else:
                print('Nombre de lignes vides: '+ str(nbr_ligne_vide))
                print('Pourcentage de lignes vides: '+str(pct_ligne_vide)+' %')

            dict_col = {}
            dict_col[col.name] = arr_null
            return(dict_col)
        else:
            print('Problem')

def main():
    anly = Analyses()
    anly.Analyse_table(df_AIS)

if __name__ == '__main__':
    main()

When I run this script, I get a:

NameError: name 'tycol' is not defined

Which refers to the second line of:

def Analyse_table():
        # Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
        Types_des_colonnes(table)
        nbr_types_colonnes_diff=len(tycol.keys())

I know it has to do with using the 'self' properly, but I really don't understand how to do so properly. Could anybody show me how to solve this very easy problem?

(All the 'self' present in this script have been added by me only to try to make it work on my own.)

The members of a Python object are distinguished from other variables by being on the right hand side of . (as in obj.member )

The first parameter of a method is bound to the object on which the method is called. By convention, this parameter is named self , this is not a technical requirement.

tycol is a normal variable, entirely unassociated with the Analyses object. self.tycol is a different name.

Notice how you return self.tycol from Types_des_colonnes , without giving it any value (which should raise an AttributeError . Have you tried running the code as you posted it in the question body?). You then discard this value at the call site.

You should either assign the result of Types_des_colonnes to a name in Analyse_table , or exclusively use the name self.tycol .

def Types_des_colonnes(self, df):
    tcol = df.columns.to_series().groupby(df.dtypes).groups
        # we don't care about tcol after this, it ceases to exist when the method ends
    self.tycol = {k.name: v for k, v in tcol.items()}
        # but we do care about self.tycol

def Analyse_table(self, table):
    # Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
    Types_des_colonnes(table)
    nbr_types_colonnes_diff = len(self.tycol.keys())
    # ...

In method Types_de_colonnes , you need to do: self.tycol=tycol . Also, you need to call the method "as a method". Take a week to read a book about python to learn some basics. Programming is easy, but not that easy :)

A class is a data structure that contains "data and the methods that operate on that data". Note, that I did not say 'functions' because a class always has access to data contained within the class, and therefore the methods in the class are not 'functions' in a mathematical sense. But, That's for another day, perhaps.

So, when do you use self ? self represents the actual instance of the class that you are invoking the method within. So if you have a class called Shape and two instances of Shape a and b then when you call a.area() the self object inside the area method refers to the instance of Shape named a , where when you invoke b.area() the self object refers to the b instance of Shape

In this way you can write a method that works for any instance of Shape . To make this more concrete, here's an example Shape class:

class Shape():
    def __init__(self, length_in, height_in):
        self.length = length_in
        self.height = height_in

    def area(self):
        return self.length * self.height

Here you can see that the data contained within the Shape class is length and height. Those values are assigned at the __init__ (in the constructor, ie. Shape a(3.0,4.0) ) And are assigned as members of self . Then, afterword they can be accessed by the method area though the self object, for calculations. These members can also be reassigned, and new members can be created. (Generally though members are only created in the constructor).

This is all very weird compared to the other simple aspects of Python design. Yet, this is not unique to Python. In C++ there is a this pointer, that serves the same purpose, and in JavaScript the way that closures are used to create objects often uses a this variable to perform the same task as Python's self .

I hope this helps a little. I can expand on any other questions you have.

Also, it's generally a good idea to do import statements at the top of the file. There are reasons not to, but none of them are good enough for normal coders to use.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM