简体   繁体   中英

Identify unintentional read/write of global variables inside a python function? For example using static analysis?

One of the things I find frustrating with python is that if I write a function like this:

def UnintentionalValueChangeOfGlobal(a):
    SomeDict['SomeKey'] = 100 + a
    b = 0.5 * SomeDict['SomeKey']
    return b

And then run it like so:

SomeDict = {}
SomeDict['SomeKey'] = 0
b = UnintentionalValueChangeOfGlobal(10)
print(SomeDict['SomeKey'])

Python will: 1) find and use SomeDict during the function call even though I have forgotten to provide it as an input to the function; 2) permanently change the value of SomeDict['SomeKey'] even though it is not included in the return statement of the function.

For me this often leads to variables unintentionally changing values - SomeDict['SomeKey'] in this case becomes 110 after the function is called when the intent was to only manipulate the function output b .

In this case I would have preferred that python: 1) crashes with an error inside the function saying that SomeDict is undefined; 2) under no circumstances permanently changes the value of any variable other than the output b after the function has been called.

I understand that it is not possible to disable the use of globals all together in python, but is there a simple method (a module or an IDE etc.) which can perform static analysis on my python functions and warn me when a function is using and/or changing the value of variables which are not the function's output? Ie, warn me whenever variables are used or manipulated which are not local to the function?

One of the reasons Python doesn't provide any obvious and easy way to prevent accessing (undeclared) global names in a function is that in Python everything (well, everything that can be assigned to a name at least) is an object, including functions, classes and modules, so preventing a function to access undeclared global names would make for quite verbose code... And nested scopes (closures etc) don't help either.

And, of course, despite globals being evils, there ARE still legitimate reasons for mutating a global object sometimes. FWIW, even linters (well pylint and pyflakes at least) don't seem to have any option to detect this AFAICT - but you'll have to double-check by yourself, as I might have overlooked it or it might exist as a pylint extension or in another linter.

OTHO, I very seldom had bugs coming from such an issue in 20+ years (I can't remember a single occurrence actually). Routinely applying basic good practices - short functions avoiding side effects as much as possible, meaningful names and good naming conventions etc, unittesting at least the critical parts etc - seem to be effective enough to prevent such issues.

One of the points here is that I have a rule about non-callable globals being to be considered as (pseudo) constants, which is denoted by naming them ALL_UPPER. This makes it very obvious when you actually either mutate or rebind one...

As a more general rule: Python is by nature a very dynamic language (heck, you can even change the class of an object at runtime...) and with a "we're all consenting adults" philosophy, so it's indeed "lacking" most of the safety guards you'll find in more "B&D" languages like Java and relies instead on conventions, good practices and plain common sense.

Now, Python is not only vey dynamic but also exposes much of it's inners, so you can certainly (if this doesn't already exists) write a pylint extension that would at least detect global names in function codes (hint: you can access the compiled code of a function object with yourfunc.co_code (py2) or yourfunc.__code__ (py3) and then inspect what names are used in the code). But unless you have to deal with a team of sloppy undisciplined devs (in which case you have another issue - there's no technical solutions to stupidity), my very humble opinion is that you're wasting your time.

Ideally I would have wanted the global-checking functionality I'm searching for to be implemented within an IDE and continuously used to assess the use of globals in functions. But since that does not appear to exist I threw together an ad hoc function which takes a python function as input and then looks at the bytecode instructions of the function to see if there are any LOAD_GLOBAL or STORE_GLOBAL instructions present. If it finds any, it tries to assess the type of the global and compare it to a list of user provided types (int, float, etc..). It then prints out the name of all global variables used by the function.

The solution is far from perfect and quite prone to false positives. For instance, if np.unique(x) is used in a function before numpy has been imported ( import numpy as np ) it will erroneously identify np as a global variable instead of a module. It will also not look into nested functions etc.

But for simple cases such as the example in this post it seems to work fine. I just used it to scan through all the functions in my codebase and it found another global usage that I was unaware of – so at least for me it is useful to have!

Here is the function:

def CheckAgainstGlobals(function, vartypes):
    """
    Function for checking if another function reads/writes data from/to global
    variables. Only variables of the types contained within 'vartypes' and
    unknown types are included in the output.

     Inputs:
        function - a python function
        vartypes - a list of variable types (int, float, dict,...)
     Example:
        # Define a function
        def testfcn(a):
            a = 1 + b
            return a

        # Check if the function read/writes global variables.    
        CheckAgainstGlobals(testfcn,[int, float, dict, complex, str])

        # Should output:
        >> Global-check of function: testfcn
        >> Loaded global variable: b (of unknown type)
    """
    import dis
    globalsFound = []
    # Disassemble the function's bytecode in a human-readable form.
    bytecode = dis.Bytecode(function)
    # Step through each instruction in the function.
    for instr in bytecode:
        # Check if instruction is to either load or store a global.
        if instr[0] == 'LOAD_GLOBAL' or instr[0] == 'STORE_GLOBAL':
            # Check if its possible to determine the type of the global.
            try:
                type(eval(instr[3]))
                TypeAvailable = True
            except:
                TypeAvailable = False
            """
            Determine if the global variable is being loaded or stored and
            check if 'argval' of the global variable matches any of the 
            vartypes provided as input.
            """
            if instr[0] == 'LOAD_GLOBAL':
                if TypeAvailable:
                    for t in vartypes:
                        if isinstance(eval(instr[3]), t):
                            s = ('Loaded global variable: %s (of type %s)' %(instr[3], t))
                            if s not in globalsFound:
                                globalsFound.append(s)
                else:
                    s = ('Loaded global variable: %s (of unknown type)' %(instr[3]))
                    if s not in globalsFound:
                        globalsFound.append(s)
            if instr[0] == 'STORE_GLOBAL':
                if TypeAvailable:
                    for t in vartypes:
                        if isinstance(eval(instr[3]), t):
                            s = ('Stored global variable: %s (of type %s)' %(instr[3], t))
                            if s not in globalsFound:
                                globalsFound.append(s)
                else:
                    s = ('Stored global variable: %s (of unknown type)' %(instr[3]))
                    if s not in globalsFound:
                        globalsFound.append(s)
    # Print out summary of detected global variable usage.
    if len(globalsFound) == 0:
        print('\nGlobal-check of fcn: %s. No read/writes of global variables were detected.' %(function.__code__.co_name))
    else:
        print('\nGlobal-check of fcn: %s' %(function.__code__.co_name))
        for s in globalsFound:
            print(s)

When used on the function in the example directly after the function has been declared, it will find warn about the usage of the global variable SomeDict but it will not be aware of its type:

def UnintentionalValueChangeOfGlobal(a):
    SomeDict['SomeKey'] = 100 + a
    b = 0.5 * SomeDict['SomeKey']
    return b
# Will find the global, but not know its type.
CheckAgainstGlobals(UnintentionalValueChangeOfGlobal,[int, float, dict, complex, str])

>> Global-check of fcn: UnintentionalValueChangeOfGlobal
>> Loaded global variable: SomeDict (of unknown type)

When used after SomeDict has been defined it also detects that the global is a dict:

SomeDict = {}
SomeDict['SomeKey'] = 0
b = UnintentionalValueChangeOfGlobal(10)
print(SomeDict['SomeKey'])
# Will find the global, and also see its type.
CheckAgainstGlobals(UnintentionalValueChangeOfGlobal,[int, float, dict, complex, str])

>> Global-check of fcn: UnintentionalValueChangeOfGlobal
>> Loaded global variable: SomeDict (of type <class 'dict'>)

Note: in its current state the function fails to detect that SomeDict['SomeKey'] changes value. Ie, it only detects the load instruction, not that the previous value of the global is manipulated. That is because the instruction STORE_SUBSCR seems to be used in this case instead of STORE_GLOBAL . But the use of the global is still detected (since it is being loaded) which is enough for me.

You can check the varible using globals():

def UnintentionalValueChangeOfGlobal(a):

    if 'SomeDict' in globals():
        raise Exception('Var in globals')

    SomeDict['SomeKey'] = 100 + a
    b = 0.5 * SomeDict['SomeKey']
    return b

SomeDict = {}
SomeDict['SomeKey'] = 0
b = UnintentionalValueChangeOfGlobal(10)
print(SomeDict['SomeKey'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM