| ===================================== |
| FUJITSU FR-V KERNEL ATOMIC OPERATIONS |
| ===================================== |
| |
| On the FR-V CPUs, there is only one atomic Read-Modify-Write operation: the SWAP/SWAPI |
| instruction. Unfortunately, this alone can't be used to implement the following operations: |
| |
| (*) Atomic add to memory |
| |
| (*) Atomic subtract from memory |
| |
| (*) Atomic bit modification (set, clear or invert) |
| |
| (*) Atomic compare and exchange |
| |
| On such CPUs, the standard way of emulating such operations in uniprocessor mode is to disable |
| interrupts, but on the FR-V CPUs, modifying the PSR takes a lot of clock cycles, and it has to be |
| done twice. This means the CPU runs for a relatively long time with interrupts disabled, |
| potentially having a great effect on interrupt latency. |
| |
| |
| ============= |
| NEW ALGORITHM |
| ============= |
| |
| To get around this, the following algorithm has been implemented. It operates in a way similar to |
| the LL/SC instruction pairs supported on a number of platforms. |
| |
| (*) The CCCR.CC3 register is reserved within the kernel to act as an atomic modify abort flag. |
| |
| (*) In the exception prologues run on kernel->kernel entry, CCCR.CC3 is set to 0 (Undefined |
| state). |
| |
| (*) All atomic operations can then be broken down into the following algorithm: |
| |
| (1) Set ICC3.Z to true and set CC3 to True (ORCC/CKEQ/ORCR). |
| |
| (2) Load the value currently in the memory to be modified into a register. |
| |
| (3) Make changes to the value. |
| |
| (4) If CC3 is still True, simultaneously and atomically (by VLIW packing): |
| |
| (a) Store the modified value back to memory. |
| |
| (b) Set ICC3.Z to false (CORCC on GR29 is sufficient for this - GR29 holds the current |
| task pointer in the kernel, and so is guaranteed to be non-zero). |
| |
| (5) If ICC3.Z is still true, go back to step (1). |
| |
| This works in a non-SMP environment because any interrupt or other exception that happens between |
| steps (1) and (4) will set CC3 to the Undefined, thus aborting the store in (4a), and causing the |
| condition in ICC3 to remain with the Z flag set, thus causing step (5) to loop back to step (1). |
| |
| |
| This algorithm suffers from two problems: |
| |
| (1) The condition CCCR.CC3 is cleared unconditionally by an exception, irrespective of whether or |
| not any changes were made to the target memory location during that exception. |
| |
| (2) The branch from step (5) back to step (1) may have to happen more than once until the store |
| manages to take place. In theory, this loop could cycle forever because there are too many |
| interrupts coming in, but it's unlikely. |
| |
| |
| ======= |
| EXAMPLE |
| ======= |
| |
| Taking an example from include/asm-frv/atomic.h: |
| |
| static inline int atomic_add_return(int i, atomic_t *v) |
| { |
| unsigned long val; |
| |
| asm("0: \n" |
| |
| It starts by setting ICC3.Z to true for later use, and also transforming that into CC3 being in the |
| True state. |
| |
| " orcc gr0,gr0,gr0,icc3 \n" <-- (1) |
| " ckeq icc3,cc7 \n" <-- (1) |
| |
| Then it does the load. Note that the final phase of step (1) is done at the same time as the |
| load. The VLIW packing ensures they are done simultaneously. The ".p" on the load must not be |
| removed without swapping the order of these two instructions. |
| |
| " ld.p %M0,%1 \n" <-- (2) |
| " orcr cc7,cc7,cc3 \n" <-- (1) |
| |
| Then the proposed modification is generated. Note that the old value can be retained if required |
| (such as in test_and_set_bit()). |
| |
| " add%I2 %1,%2,%1 \n" <-- (3) |
| |
| Then it attempts to store the value back, contingent on no exception having cleared CC3 since it |
| was set to True. |
| |
| " cst.p %1,%M0 ,cc3,#1 \n" <-- (4a) |
| |
| It simultaneously records the success or failure of the store in ICC3.Z. |
| |
| " corcc gr29,gr29,gr0 ,cc3,#1 \n" <-- (4b) |
| |
| Such that the branch can then be taken if the operation was aborted. |
| |
| " beq icc3,#0,0b \n" <-- (5) |
| : "+U"(v->counter), "=&r"(val) |
| : "NPr"(i) |
| : "memory", "cc7", "cc3", "icc3" |
| ); |
| |
| return val; |
| } |
| |
| |
| ============= |
| CONFIGURATION |
| ============= |
| |
| The atomic ops implementation can be made inline or out-of-line by changing the |
| CONFIG_FRV_OUTOFLINE_ATOMIC_OPS configuration variable. Making it out-of-line has a number of |
| advantages: |
| |
| - The resulting kernel image may be smaller |
| - Debugging is easier as atomic ops can just be stepped over and they can be breakpointed |
| |
| Keeping it inline also has a number of advantages: |
| |
| - The resulting kernel may be Faster |
| - no out-of-line function calls need to be made |
| - the compiler doesn't have half its registers clobbered by making a call |
| |
| The out-of-line implementations live in arch/frv/lib/atomic-ops.S. |