Tuesday 9 March 2010

Syscalls and IPC

The next thing i've been looking at is system calls, and using them to implement IPC.

I wanted system calls to lead to a direct context switch efficiently, so that for example, in the simple case an IPC call to a server process acts synchronously. But on the other hand, there are other tasks for system calls which are really just isolated or system-level function calls, and they don't need the overhead.

So I came up with the idea of two levels - simple system calls which are invoked and return much any other function invocation, and heavier ones which might lead to a context switch. I just then use the system call number to chose what code to execute.

It's actually easier just showing the code I came up with than trying to explain it.
svc_entry:      
cmp r7,#SVC_SIMPLE_LAST
push { r12, lr }
adrlo r12,svc_vectors
adrlo lr,svc_exit @ low vectors, call and return to svc_exit
ldrlo pc,[r12, r7, lsl #2]
...
svc_exit:
ldmia sp!,{r12, pc }^

Well that's the entirety of the code-path for 'simple' system calls. It doesn't strictly need to save r12, but it makes the context-switching case a bit simpler, and keeps the stack aligned properly too. It doesn't need to save the normal EABI work registers, because the target vector will save any it needs to.

At the svc_exit label I initially had `pop { r12, pc }^' thinking that the pop pseudo-op would work the same as ldmia as it does normally, but implement an exception return because of the trailing `^' ... but it doesn't ...

The "we might context switch" case is then much the same as every other exception handler, and re-uses some of the code too. It saves the state to the current TCB, and then invokes the vector. It also handles invalid system call numbers by suspending the task.
        srsdb   #MODE_IRQ
cps #MODE_IRQ
stm sp,{r0-r14}^
cps #MODE_SUPERVISOR

cmp r7,#SVC_LAST
adr r12,svc_vectors
adr lr,ex_context_exit
ldrlo pc,[r12, r7, lsl #2]
b task_error_handler @ syscall # invalid - suspend task

One of the simplest useful IPC mechanism I can think of is simply a signal based one (the AmigaOS idea of signals, not the Unix one, although they can be made to work the same). If you're waiting on a signal and it arrives you wake up, otherwise it just gets registered asynchronously and if you wait later, it no-ops.

It only required 2 basic operations, and another 2 for a bit more flexibility.

Signal signals another task with a single signal bit, and Wait puts the current task to sleep until any one of a given set of signals arrives, or does nothing if it already has. Since either may trigger a context switch, these are implemented using the heavier-weight system call mechanism.

The other two functions are AllocSignal and FreeSignal whose usage is pretty obvious. These are implemented using non-context switching system calls.

The functions are very simple. First Signal, all it does is set the signal bit in the task control block, and checks if the task was waiting for it. If so it sets the return value the function expects, adds the task back to the run queue and reschedules the task, which may then execute depending on the scheduling algorithm.
void svc_Signal(int tid, int sigNum) {
struct task *tcb = findTask(tid);

if (!tcb)
return;

sigmask_t m = (1<<sigNum);
sigmask_t sigs;

tcb->sigReceived |= m;
if (tcb->state == TS_WAIT) {
sigs = tcb->sigReceived & tcb->sigWaiting;
if (sigs) {
tcb->tcb.regs[0] = sigs;

... move tcb to run queue ...
... reschedule tasks ...
}
}
}

Wait is just as simple. In the simplest case, if any of the signals it is waiting on have already arrived, it just returns the intersection of the two sets. Otherwise, the task is put to sleep by placing it on the wait queue, storing the signals it is waiting on. The `trick' here is that when it wakes up, it will just act like it has returned from a context switch - and start executing immediately after the svc instruction. The Signal code above uses that to return the return-code in the delayed case via register r0 - the standard ABI register for function return values.
sigmask_t svc_Wait(sigmask_t sigMask) {
struct task *tcb = thistask;
sigmask_t sigs;

sigs = tcb->sigReceived & sigMask;
if (sigs) {
tcb->sigReceived &= ~sigs;
} else {
tcb->sigWaiting = sigMask;
tcb->state = TS_WAIT;

... move current task to wait queue ...
... reschedule tasks ...
}

return sigs;
}

Invoking system calls uses the svc instruction. This encodes the system-call number in the instruction, but you can also pass it in a register - and I follow the linux EABI which puts the system call number in r7.

With this mecanism it is trivial to implement a system-call in assembly language - tell the compiler what the args are, but the assembly just passes them to the system.

C code:
extern int Wait(int sigmask);
ASM:
Wait:   push { r7 }
mov r7,#3
svc #3
pop { r7 }
bx lr

But this wont let the compiler in-line the calls, which would be desirable in many cases. So one must resort to inline asm in the C source.
sigmask_t Wait(sigmask_t sigMask) {
register int res asm ("r0");
register int sigm asm("r0") = sigMask;

asm volatile("mov r7,#3; svc #3"
: "=r" (res)
: "r" (sigm)
: "r7", "r1", "r2", "r3", "r12");

return res;
}

(I'm not sure if that is 100% correct - but it seems to compile correctly.)

Code in ipc-signal.c and another version of context.S.

Other IPC

Well beyond simple signals there are a few other possibilities:
  • Mailboxes - These are usually a fixed-length queue which can be written to asynchronously if is it not full, but only with simple data type such as an integer or pointer. No dynamic memory is required inside the kernel so it is easy to implement. Shared memory and the like is required to send more complex information.
  • Shared memory - Fairly simple once you have a mechanism to share memory. No way to send or receive notifications about state changes though, so requires polling (bad for lots of reasons), or other IPC primitives such as signals. Requires some sort of serialisation mechanism too, such as mutexes or semaphores which are best avoided.
  • Sychronous message passing - This is used to implement a light-weight context switch in microkernels. Because the callee and caller rendezvous at the same time, there's no need to save state beyond the normal state saved in a function call, and simplifies scheduling. If asynchronous or multi-processing is required threads are needed though, and passing large messages can be expensive.
  • Asynchronous message passing - This uses a queue to keep track of the received messages and allows the target to receive them when ready. If the receiver is ready and waiting it can work much like the synchronous case, otherwise the sender can keep doing work. In a non-protected/non-virtual memory environment this easy and efficient to implement, but these both add complications. For example dynamic memory is needed in the kernel, and messages saved to a kernel-queue involve a double copy - although it can be implemented without copying too.

I'm fond of the async message passing model, particularly with non-copying of messages - I used it in Evolution and other threaded code i've written since. So i'd like to try and get that working efficiently in a protected/virtual memory environment if I can - I had it working before on x86. I don't know if it worked terribly efficiently, but I don't see why it shouldn't be too bad.

Mailboxes also look interesting - because of their simplicity mostly. They can convey more information than a single bit as with signals, but i'm not sure they could replace them (the CELL BE hardware implements mailboxes plus a bit-based system much the same as signals - and if they spent the hardware to do them both there's probably a good reason).

Once I get some mechanism to pass data between tasks I guess i'll look at something a bit more meaty, like a device driver - and go back to banging my head against that gigantic OMAP manual.

No comments: