简体   繁体   中英

Suggestions on solving locking problem with ARM 7 CCL Lisp compiler, raspberry Pi?

Background: CCL aka OpenMCL is a very nice, venerable, lightweight but fairly fast Common Lisp compiler. It's an excellent match for the RPi because it runs on 32-bit models, and isn't too memory intensive. In theory, unlike the heavier SBCL lisp compiler, it supports threads on 32-bit RPi. But it has a long-standing mutex bug.

However, this is an ARM machine language question, not a Lisp question. I'm hoping an ARM expert will read this and have an Aha! moment from a different context.

The problem is that CCL suffers a fatal flaw on the Raspberry Pi 2 and 3 (and probably 4), as well as other ARM boards: when using threading and locking it will fail with memory corruption. The threading failure has been known for years .

I believe I isolated the issue further, to a locking failure: when a CCL thread grabs a lock (mutex) and checks to see if it owns the lock, it sometimes turns out that another thread owns the lock. Threads seem to steal each other locks, which would be fatal for garbage collection, among other things. It seems to me that information that one core has taken control of a lock does not percolate through to the other cores before the other cores grab it themselves (race condition). This bug does not happen on one-core RPis, like the Pi Zero.

I've explored this bug in this GitHub repo . The relevant function is (threadtest2) which spawns threads, performs locks, and checks lock ownership. I initially thought that the locking code might be a missing DMB instruction; DMB "ensures that the exclusive write is synchronized to all processors". Thus I put DMB instructions all over the locking code (but upon looking carefully, DMB was already there in a few spots, so the original compiler author had thought of this).

In detail, I put DMBs into just about every locking routine of arm-misc.lisp called from the futex-free version of %get-spin-lock called by %lock-recursive-lock-ptr inARM/l0-misc.lisp , with no luck. The low-level function in ARM/l0-misc.lisp is `%ptr-store-fixnum-conditional'. This doesn't use DMB, but uses LDREX/STREX atomic update functions.

[edit] As user coredump points out below, DMB is indeed necessary on multi-cores according to blogs and ARM docs, though there is some disagreement concerning how many places it should appear, after the STREX or also before the LDREX.

Obviously, I'm not asking anyone to diagnose this compiler. My question is

Does this lock-stealing behavior ring a bell? Has anyone else seen this problem of lock-stealing or race-condition on the ARM in another context, and have they found a solution? Is there something I'm missing about DMB, or is there another instruction needed?


As an addendum, here is my annotation of the part where it might be failing, in ARM/lo-misc.lisp, function %ptr-store-fixnum-conditional - this is machine code in Lisp format. I inserted some DMBs as shown following the comment below, and it didn't help.

    ;; this is the function used to grab the mutex, using ldrex/strex
    ;; to set a memory location atomically
    (defarmlapfunction %ptr-store-fixnum-conditional 
       ((ptr arg_x) (expected-oldval arg_y) (newval arg_z))
     (let ((address imm2)  ;; define some variables
           (actual-oldval imm1))
           (macptr-ptr address ptr)
           @again
      (DMB) ;; my new DMB, according to Chen's blog (not ARM manual)
          ;; first, load word from memory with ldrex, 
          ;;    initializing atomic memory operation
          ;;    and claiming control of this memory for this core
          (ldrex actual-oldval (:@ address))
          ;; if the actual-oldval is wrong, then give up on 
          ;;  this pointer store because the lock is taken,
          ;;    (looping higher up in code until free)
          (cmp actual-oldval expected-oldval)
          (bne @done)
          ;; 
          ;; 2nd part of exclusive memory access:
          ;;  store newval into memory and put a flag into imm0
          (strex imm0 newval (:@ address))
          ;; if the exclusive store failed, another core messed 
          ;;    with memory, so loop for another  ldrex/strex cycle 
          (cmp imm0 (:$ 0))
          (bne @again)
     (DMB) ;; my new DMB after conditional jump
          ;; success: the lock was obtained (and exclusive access
          ;;    was cleared by strex)
          (mov arg_z actual-oldval)
          (bx lr)  ;; return to caller in case of good mutex grab
          @done
          ;; clear exclusive access if lock grab failed
          (clrex)
          (mov arg_z actual-oldval)
      (DMB) ;; my new DMB.  Maybe not needed?
          (bx lr)))  ;; return to caller in case of failed mutex grab

Addendum - Once again, I tried put DMB around every LDREX/STREX, and it didn't help. I also tried putting a DMB into every %SET-xxx function, following the ARM docs on releasing mutexes, but this was harder to trace - I couldn't find out where %%set-unsigned-long was defined, after grepping the whole source tree, so I blindly stuffed DMB before every STR instruction inside a %SET-xxx function.

I believe that CCL uses system level futex on other platforms, and does its own custom locking only (?) on the ARM, if that's another clue. Maybe the whole thing could be fixed using the OS supplied futex? Maybe no other system uses custom locks, so the ARM is just the first (multi-core) system to show breakage?


You can try, to see if it helps, to add a DMB instruction before the LDREX instruction and after the STREX instruction. The DMB instruction is a memory barrier instruction, which ensures that the exclusive write is synchronized to all processors. The DMB instruction is described in the ARM Architecture Reference Manual.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM