Wednesday, December 8, 2010

Building Mongrel2 on ARM architecture

The past weekend I got mongrel2 to build and on my sheevaplug. Not really rocket science, but in case anyone is interested or has some more experience (more on that in the end of the post), I'll document what I did.

First thing - installing the prerequisites, that was easy, just apt-get them - same as anywhere else.

So, then I kick off the make and get:

cc -g -O2 -Wall -Isrc -DNDEBUG -c -o src/task/context.o src/task/context.c
src/task/context.c: In function 'makecontext':
src/task/context.c:97: error: 'mcontext_t' has no member named 'gregs'
src/task/context.c:101: error: 'mcontext_t' has no member named 'gregs'
src/task/context.c:102: error: 'mcontext_t' has no member named 'gregs'
make: *** [src/task/context.o] Error 1

This does not look cool. Let's see the offending part of the code in src/task/context.c:

void makecontext(ucontext_t *uc, void (*fn)(void), int argc, ...)
int i, *sp;
va_list arg;

sp = (int*)uc->uc_stack.ss_sp+uc->uc_stack.ss_size/4;
va_start(arg, argc);

for(i=0; i<4 && i uc->uc_mcontext.gregs[i] = va_arg(arg, uint);

uc->uc_mcontext.gregs[13] = (uint)sp;
uc->uc_mcontext.gregs[14] = (uint)fn;

Okay, this is not too shabby - means this is an ARM-specific code. But, if this is ARM-specific code - how comes that we have a mismatch for the type ? And why do we need this code to begin with - "man makecontext" shows it there ? Quick Google search shows the makecontext is not defined on the ARM architectures, obsoleted API, bla bla bla. Anyway let's test what do they mean by "not defined" with a dumb app.

ayourtch@ubuntu:~/test$ cat test.c

int main(int argc, char *argv[]) {
makecontext(0, 0, 0);

ayourtch@ubuntu:~/test$ gcc test.c
/tmp/ccOO8Fqq.o: In function `main':
test.c:(.text+0x24): warning: warning: makecontext is not implemented and will always fail
ayourtch@ubuntu:~/test$ gdb a.out
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi"...
(gdb) disass makecontext
Dump of assembler code for function makecontext:
0x000082e8 : add r12, pc, #0 ; 0x0
0x000082ec : add r12, r12, #32768 ; 0x8000
0x000082f0 : ldr pc, [r12, #3368]!
End of assembler dump.

Okay, surely they really mean it - this code seems indeed like just a plug. No-one's home. At least we have a warning. So what they say on the interwebs is true, and that's why we're dragging this ARM-only function implementation.

Let's take a look at where this structure is defined.

$ grep -R -A 10 mcontext_t /usr/include
/usr/include/sys/ucontext.h:typedef struct sigcontext mcontext_t;
/usr/include/sys/ucontext.h: mcontext_t uc_mcontext;
/usr/include/signal.h:/* This will define `ucontext_t' and `mcontext_t'. */

This structure defines the mcontext_t as follows:

typedef struct sigcontext mcontext_t;

And the sigcontext is defined in /usr/include/asm/sigcontext.h:

unsigned long trap_no;
unsigned long error_code;
unsigned long oldmask;
unsigned long arm_r0;
unsigned long arm_r1;
unsigned long arm_r2;
unsigned long arm_r3;
unsigned long arm_r4;
unsigned long arm_r5;
unsigned long arm_r6;
unsigned long arm_r7;
unsigned long arm_r8;
unsigned long arm_r9;
unsigned long arm_r10;
unsigned long arm_fp;
unsigned long arm_ip;
unsigned long arm_sp;
unsigned long arm_lr;
unsigned long arm_pc;
unsigned long arm_cpsr;
unsigned long fault_address;

As such, we need to adjust the code accordingly - to use the arm_XXX named members instead of the array (but array was so convenient!)

After this we can compile everything, but alas, the segfault happens at the very first test.

Why ? Because the libtask's ARM definition of mcontext_t assumes starting with R0 content - but on sheevaplug I can see 12 bytes of other content in front.

My first attempt was to modify the assembler code, to add 12 to the structure pointer on use - but in the end I came up with I think a (slightly) better approach - instead of passing the address of the mcontext_t structure, on the ARM architecture I pass the address of the R0 member withiin the structure, typecasting it to (void *).

This avoids the need to dig into assembler code.

Now I can successfully run mongrel2 on sheevaplug.

However, this leaves me with the question:

Inevitably, the original code ran on *some* ARM. So if the code with the diff from broke the build for you - let me know, so we can fix it properly for everyone - I do not have other hardware.

No comments: