简体   繁体   中英

Python variable scope issue in script

I'm coming across some weirdness with a variable not being accessible in other functions after being set. This is a Celery task file named html.py

base_path = ''

@app.task(bind=True)
def status(self):
    """
    returns the count of files downloaded and the timestamp of the most recently downloaded file
    """

    num_count = 0
    latest_timestamp = ''
    for root, _, filenames in os.walk(base_path):
        for filename in filenames:
            file_path = root + '/' + filename
            file_timestamp = datetime.fromtimestamp(os.path.getctime(file_path))
            if latest_timestamp == '' or file_timestamp > latest_timestamp:
                latest_timestamp = file_timestamp
            num_count += 1

@app.task(bind = True)
def download(self, url='', cl_id=-1):
    if len(url) == 0 or cl_id < 0:
        return None

    base_path = settings.WGET_PATH + str(cl_id)

    log_paths = {
        'output' : wget_base_path + '/out.log',
        'rejected' : wget_base_path + '/rejected.log'
    }

    create_files(log_paths)
    wget_cmd = 'wget -prc --convert-links --html-extension --wait=3 --random-wait --no-parent ' \
                   '--directory-prefix={0} -o {1} --rejected-log={2} {3}'.\
        format(wget_base_path, log_paths['output'], log_paths['rejected'], url)

    subprocess.Popen(wget_cmd, shell = True)

When I call this via

from ingest.task import html
web_url = 'https://www.gnu.org/software/wget/manual/html_node/index.html'
ingest = html.download.delay(web_url, 54321)

the wget process kicks off as expected. However, the base_path parameter at the top of the file never gets set, so when I call status via

status = html.status.delay()

the base_path variable is an empty string, despite status being called after download . Is this because these tasks are in a script vs a class?

Because in function download at this line

base_path = settings.WGET_PATH + str(cl_id)

you just creates a local variable with name base_path . To avoid it you should declare base_path in function as global . For example:

@app.task(bind = True)
def download(self, url='', cl_id=-1):
    if len(url) == 0 or cl_id < 0:
        return None

    global base_path
    base_path = settings.WGET_PATH + str(cl_id)
...

From Python docs :

At any time during execution, there are at least three nested scopes whose namespaces are directly accessible:

  • the innermost scope, which is searched first, contains the local names
  • the scopes of any enclosing functions, which are searched starting with the nearest enclosing scope, contains non-local, but also non-global names
  • the next-to-last scope contains the current module's global names
  • the outermost scope (searched last) is the namespace containing built-in names

If a name is declared global, then all references and assignments go directly to the middle scope containing the module's global names. Otherwise, all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM