简体   繁体   中英

File write collisions on parallelized python

I'm doing some research in neuroscience and I'm using python's tinydb library for keeping track of all my model training runs and the data they generate.

One of the issues I realized might come up is when I try to train multiple models on a cluster. What could happen is that two threads might try to write to the tinydb json file at the same time.

Can someone please let me know if this will be an issue?

Python processes, threads and coroutines offers synchronization primitives such as locks, rlocks, conditions and semaphores. If your threads access randomly one or more shared variables then every thread should acquire lock on this variable so that another thread couldn't access it.

Paraphrased question: Can I update a json file concurrently?

Answer: No

Suggestions:

  1. Use a file locking system to prevent simultaneous read/write of the aggregated results.
  2. Have each unit of work output to it's own results file and run a separate job to aggregate results as needed
  3. Use a thread safe database, eg ( sqlite3 )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM