Thursday 21 June 2012

arm, tegra3, neon vfp, ffmpeg and crashes

So I just did a release of jjmpeg including the android player ... and then a few hours later finally discovered the cause of crashes I've been having ...

Either FFmpeg, the android sdk compiler or the Tegra 3 processor (or the way i'm using any of them) has some sort of issue which causes bus errors fairly regularly but never repeatably. Possibly because of mis-aligned accesses. Unfortunately when I compile without optimisations on - the build fails, which makes it a bit hard to debug ... i got gdb to run (once only though, subsequent runs fail), and got a half-decent backtrace, but optimistions obscured important details.

Anyway i noticed that 0.11.1 has a bunch of ARM work, so I upgraded the FFmpeg build, and mucked about with the build options for an hour trying to suss out the right ones and to see how various ones worked.

Short story is that using armeabi-7a causes the crash to appear (with any sort of float, vfp, neon, or soft), and dropping back to armeabi fixes everything.

Unless I can get better debugging results I think i'll just stick with armeabi for the foreseeable future. I can't find anything recent about these types of problems, so perhaps it's just my configuration but I really just don't know enough about the ARM specifics at this point to tell either way.

Well, that's enough for today.

Update: I spent another day or so on this and finally nutted it out. It was due to alignment problems - but it was odd that it happened so rarely.

As best I could work out, ARMV6+ allows non-aligned memory accesses, but the standard ARM system control module can be programmed to cause faults. And just to complicate matters the ARM linux kernel has the ability to handle the faults and implement the mis-aligned access manually, and this the behaviour can be configured at run-time via proc. It seems the kernel on my tablet is configured to cause faults, and not having administrative access I am unable to change it ...

So the problem is that FFmpeg's configure script assumes mis-aligned memory accesses are safe if you're using armv6 or higher. Anyway I filed a bug although so far indications are that the bug triager doesn't know what I filed (see update 2) - i'm not fussed as I have a work-around that doesn't require any patch.

I had wasted a lot of time based on thinking it was neon or optimisation related, whereas it was just ARMv5 vs anything else behaviour. When i finally did get it to compile without optimisation turned on, the backtrace I got was still identical and so worrying about getting a good backtrace was pointless. I had wrongly assumed that a modern cpu would handle mis-aligned accesses ok, not working at the assembly language level for a while gets you rusty ...

I suppose the main upshot of posting on the libav-user list ... that mostly just resulted in me wasting a full day of fucking about ... is that I realised my configure invocation was still broken (more problems one got from copying some random configure script from the net) and so I managed to clean it up further.

Bit over it all now.

Update: So the actual fix was to run this sed command over config.h after configure is executed:

sed -i -e 's/ HAVE_FAST_UNALIGNED 1/ HAVE_FAST_UNALIGNED 0/' $buildir/config.h

Update 2: Good-o, they've just added a configure option to override it.

Update 3: Can anyone tell me why this post is getting so many hits over the last month or so (June '14) It's showing up in the access stats but there's no info on why.

No comments: