简体   繁体   中英

Python telemetry - Log all python modules loaded

I'm creating a friendly python development environment using venv and pip so users can start a jupyter notebook with all libs they can need on PYTHONPATH. In order to remove libs not used by users, I want to log all libs loaded during their Python execution in a centralized NAS share (they run their python process on their hosts, but they can write in a central NAS share). To do so, I considered using strace -e trace=open,read python.. and filter the entries by my libs file path, but I'm pretty sure there should be a standard python lib to do such telemetry job.

You could wrap builtins.__input__ to also log the name of the module imported, then run (via the exec function, but we need to catch if __name__ == '__main__': blocks so it's likely unavoidable) the code you want to monitor:

loggy.py

#!/usr/bin/env python3

import builtins
import sys

if __name__ == '__main__':

    builtins.__import_copy__ = builtins.__import__

    __libs = set()

    def __import_wrap__(*args,**kwargs):
        __libs.add(args[0])
        return builtins.__import_copy__(*args,**kwargs)

    builtins.__import__ = __import_wrap__

    sys.argv = sys.argv[1:]

    exec(open(sys.argv[0]).read())

    print(__libs)

Test script test.py

#!/usr/bin/env python3

import re
import sys

if __name__ == '__main__':
    x = re.findall(r'all\w*',sys.argv[1])
    print(x)

For me (Python 3.6.9) this seems to work to log all modules imported by the secondary script:

mostly@ubuntu:~$ python3 loggy.py test.py "allways misspelling"
['allways']
{'_heapq', 'functools', 'collections.abc', 'operator', 'weakref', 'sys', 'copyreg', '_weakref', 'itertools', '_locale', 'abc', '_collections', 'sre_compile', 'enum', 'sre_parse', '_operator', '_collections_abc', '_bootlocale', '_sre', 'sre_constants', '_thread', 'heapq', 'reprlib', 'keyword', '_functools', 're', 'collections', '_weakrefset', 'builtins', 'types'}

Problem

Monitor python processes on user python environments/workstations to determine which unused libs to remove.

Solution

I'm not sure if there's a standalone tool readily available. strace (as you mentioned in your question) + syslog is an option. I know you mentioned persisting the logs to a NAS, but I would consider also having lightweight syslog server as syslog can handle edge cases (ie: automatic retries) and provide an interface for you to be able to filter logs.

References

Syslog Filter Reference: https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.17/administration-guide/52#TOPIC-989744

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM