I'm coming across some weirdness with a variable not being accessible in other functions after being set. This is a Celery task file named html.py
base_path = ''
@app.task(bind=True)
def status(self):
"""
returns the count of files downloaded and the timestamp of the most recently downloaded file
"""
num_count = 0
latest_timestamp = ''
for root, _, filenames in os.walk(base_path):
for filename in filenames:
file_path = root + '/' + filename
file_timestamp = datetime.fromtimestamp(os.path.getctime(file_path))
if latest_timestamp == '' or file_timestamp > latest_timestamp:
latest_timestamp = file_timestamp
num_count += 1
@app.task(bind = True)
def download(self, url='', cl_id=-1):
if len(url) == 0 or cl_id < 0:
return None
base_path = settings.WGET_PATH + str(cl_id)
log_paths = {
'output' : wget_base_path + '/out.log',
'rejected' : wget_base_path + '/rejected.log'
}
create_files(log_paths)
wget_cmd = 'wget -prc --convert-links --html-extension --wait=3 --random-wait --no-parent ' \
'--directory-prefix={0} -o {1} --rejected-log={2} {3}'.\
format(wget_base_path, log_paths['output'], log_paths['rejected'], url)
subprocess.Popen(wget_cmd, shell = True)
When I call this via
from ingest.task import html
web_url = 'https://www.gnu.org/software/wget/manual/html_node/index.html'
ingest = html.download.delay(web_url, 54321)
the wget process kicks off as expected. However, the base_path
parameter at the top of the file never gets set, so when I call status
via
status = html.status.delay()
the base_path
variable is an empty string, despite status
being called after download
. Is this because these tasks are in a script vs a class?
Because in function download
at this line
base_path = settings.WGET_PATH + str(cl_id)
you just creates a local variable with name base_path
. To avoid it you should declare base_path
in function as global
. For example:
@app.task(bind = True)
def download(self, url='', cl_id=-1):
if len(url) == 0 or cl_id < 0:
return None
global base_path
base_path = settings.WGET_PATH + str(cl_id)
...
From Python docs :
At any time during execution, there are at least three nested scopes whose namespaces are directly accessible:
If a name is declared global, then all references and assignments go directly to the middle scope containing the module's global names. Otherwise, all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.