Sunday 21 August 2011

Visual Selections

After hitting the bottle with a couple of mates last night I ended up sleeping on the couch (too lazy to make my bed after washing the sheets), woke up about 6:30 feeling a bit ordinary but I thought i'd get some hacking out of the way because I was awake and couldn't really face too much else.

I continued to work on removing the Piccolo2D stuff - which I completed. Now i'm just using plain Java2D for the rendering. Of course, one of the big reasons I did it was so I could zoom without making the handles scale as well ... and of course I forgot to implement that so it all zooms anyway. No biggy ... at least it seems to work. And it should be relatively simple to implement it.

I still need quite a bit of code to implement interesting interface behaviour ... but at least it is no worse than the Piccolo2D stuff was before it.


As part of that I visited the selection code. Some time ago I had the idea of being able to show the current selection using a shadow mask - including the current 'feathering' and so on. So I added that - all 4 lines of code required for that. Well I think it looks pretty cool, and if you have a fast machine it updates in real-time as you edit the selection even though it uses a Gaussian blur for the feathering.

Update: Since I think there's now enough functionality to move it beyond the simply embarrassing stage, I've packaged the first public alpha release too. See the downloads page on the project, but don't expect too much.

Saturday 20 August 2011

Crop Tool

I had a few hours to play with this morning and I had another look at the crop tool for ImageZ.



It lets you change the top/left/bottom/right edge with the pre-lit 'handles', or drag the whole rectangle around clicking inside the box. Clicks outside of the box let you drag to select a new bound. So pretty simple/obvious interface, although I couldn't be bothered implementing the corner handles.

So I had previously decided that Piccolo2D just doesn't quite fit this application - the need to have user-interface elements which don't scale with the zoom setting was the deal-breaker. Current code that uses it has some very messy mechanisms to make it work - sometimes.

The new stuff just uses some custom objects and Java2D to do the rendering and a very flat/simple 'scene graph'. So far I haven't even added any sort of optimised rendering although I probably will need to. Although right now it is fast most of the time (not so much when zoomed - but that was the same for the piccolo2d stuff too).

I also played with a slightly different event model - currently the current tool gets mouse events, but I decided rather than have the tool manage events of control-handles they can do it themselves. It wasn't much extra code to implement the event routing in the imageview class.

I think there's still a bit more work for the tool design before i think i'll be finished with it, but at least it feels i'm moving forwards with it. I will keep an eye on this as I discover what works and what doesn't and eventually clean it all up to a consistent and hopefully simple interface.

Probably the next thing to look at will be improving the brush mechanics though, or maybe fixing the super-affine tool - perhaps with that I will have enough to drop a jar.

Friday 19 August 2011

Another one bites the dust ...

Mum just called to let me know another one of my brothers has died. It wasn't a surprise - he was literally on his death-bead a few times last year and despite numerous health problems continued to smoke and drink. Add to that a misspent youth of drug abuse and kleptomania and who knows what else, and it was pretty much a foregone conclusion. It's surprising perhaps that he even made it to his early 50's in the first place.

Barely knew him myself - he disappeared for about 10 years in the mid 80s, finally turning up in Perth as a lawyer (which was surprising given his past). I think he always blamed mum for pushing him to get a trade rather than follow educational prospects, but I think he must've fried his brain so much he forgot what a dope-head he was at the time. Then again, I was a bit too young at the time to really grok what was going on. Then he more or less circumnavigated the whole country in the years following, having a son with his girlfriend along the way.

Seems the census collector found him when she returned to pick up the census form (his gf left him a couple of years ago, apparently to futilely chase some young bloke). Given that the collector (probably) dropped it off a week earlier and it wasn't touched, he may have died some time before that. Bit sad I suppose. And it's not like this was in the city either, it was a country town. Nothing like a bit of community cohesion! On ya Australia!

2 down, 7 to go ...

Still, even if it wasn't unexpected and I barely knew him, death of a sibling is still a strange thing to experience, even the second time around.

Thursday 18 August 2011

GEGL/OpenCL

So apparently a lad's been working on getting some OpenCL code into GEGL. What surprises me is just how slow the result is - and how slow GEGL is at doing the super-simple operation of brightness/contrast even with a CPU.

Of course, I'm not sure exactly what is being timed here, so perhaps it's timing a lot more than just the mathematics. Well obviously it has to be, my ageing Pentium-M laptop can do a 1024x1024xRGBA/FLOAT brightness/contrast in about 70ms with simple single-threaded Java code. So 500ms for the same operation using 'optimised sse2' is including a hell of a lot of extra stuff beyond the maths. Curiously, the screenshot of the profiler shows 840 'tiles' have been processed, if they are 128x64 as suggested then that is 6MP, not 1MP as stated in the post - in that case 500ms isn't so bad (it isn't great either, but at least it's in the same order).

I tried posting this to the forum linked to this phoronix post but for whatever reason it refused to take the post, so i'll post it here instead.


This result is really slow. Like about 100x off if I have the relative performance of that gpu correct. Even the CPU timings look suspect - is GEGL really that slow?

A list of potential bottlenecks:
  • the locking stuff sounds overly complex, but maybe that's a gegl requirement
  • are you timing 1-off allocations which skew the results?
  • moving single tiles back/and forth/processing them separately (this is a big one)
  • processing only a single tile per kernel call (this is a really big no-no)
  • might want to specify the local work-size to ensure the best memory access pattern on the opencl side. 16x16 usually works well for image processes per pixel on a gpu.
  • PCI latency, related to working with small blobs of data at a time. This can be completely hidden fairly easily by queueing up more jobs before a synchronisation point (either a clFinish or EnqueueReadBuffer(, true). Also you need to do a clFlush if you want the work to start while the cpu is still doing something (e.g. queuing up more work).
  • GEGL design. I know nothing about it, but if you need to go to the CPU to do synchronisation between each composed operation you may never achieve very good performance. Ideally you upload data once to the gpu, then do all processing without any cpu synchronisation until the final result is ready. By default an opencl command-queue is in-order (and no implementation support out of order anyway), so you leverage that as well. If GEGL can't already handle threads to do a similar parallelisation it might not be ready for opencl either.
  • GEGL itself. Since the GEGL CPU timings are so slow (i mean, really really slow) GEGL must be doing a lot more behind the scenes/adding so much overhead that the actual calculations are completely swamped. If this is 'fixed', then no matter what you do, such processing will always be relatively slow, although as the complexity of the algorithm increases this fixed overhead will matter less.

A list of things which can't be bottlenecks:
  • PCI bandwidth. It's just not enough data to matter.
  • OpenCL kernel - maybe it can be improved with a better work-group-size, but it's so simple it can't really be wrong.

Suggestions
  • My gut feeling is that you ignore tiles completely on the opencl backend. Even doing manual cpu-side composition of tiles into aggregate will be fairly cheap compared to synchronous transfers/operations. Composing operations complicate matters though ...
  • Don't try to hide too much detail with abstractions. It usually just makes it harder to know what's really going on (particularly for another coder).
  • Don't worry too much about comparing such a simple operation with the CPU. The CPU should already be able to do it at about memory speed, and you're adding PCI copies in-between. It's the more interesting stuff like convolution or FFT-based algorithms where the GPU will blow it completely out of the water.
  • Think of the GPU processor as a 'stream' processor. You want to load it up with a pipeline of operations and keep the pipe stuffed with work. Waiting for the pipeline to empty before adding more work will kill performance faster than anything else. This applies at every level - the individual threads, SM's, as well as data blocks.
  • Might need to do some profiling of the CPU GEGL brightness/contrast implementation. Something other than the actual calculations is taking most of the time.

In the nvidia profiler, look at the 'gpu time width plot' to see when the gpu is actually doing work. You'll probably see the individual jobs (and memory transfers) take almost no time and it's mostly sitting idle waiting for work from the cpu. It's that idle time which is going to be 99% of the elapsed time which is where you find all the gains at this point.

Don't even bother looking at the graph you posted - memory transfer time will have to be greater than the processing time since the processing is so simple and the gpu memory bandwidth is so much higher than pci speed. All you're doing is confirming that fact. The memory transfer time can mostly be hidden using asynchronous programming techniques anyway, so it is basically irrelevant.

Wednesday 17 August 2011

10K

So in a bit over 2 years since I turned on the stats, this blog broke the 10K hit barrier in the last few weeks. I guess that's nothing particularly to speak of but for what is mostly a bunch of private rants and technical musings it's not insignificant either.

Although one particular page has the lions share of the hits - and that it continues to do so is interesting in itself. This is the long and rather rambling post about trying to find a Java FFT library and some abuse about visual studio. Although it's clearly the Java FFT that people are searching for to find that page! It shows that someone is doing some scientific programming in Java, which I find interesting. The only thing I really wish Java had for this was a native complex type - doing anything with complex numbers quickly gets ugly, and even worse if you want some speed.

Second on the list is the BeagleBoard GS2010 wrap-up post with about half the number of hits. For such a small community there is quite a lot of interest out there. Unfortunately work commitments and other interests have pulled me away from spending time with the Beagleboard, which is a bit of a pity. For the moment all i'm using mine for is for playing internet radio plugged into my stereo. It's sitting boxless on a coffee table next to the amp and my 'user interface' consists of telnetting to it from my workstation and running mplayer on the command line :)

The next few 'high hitters' (if you could call them that) are low-level posts on: SSE optimisations (which basically said they don't make nearly as much difference as vector ops on CELL did), OpenCL Images vs Arrays (which I find rather difficult to understand myself, but i think the gist of it is that you have to write code differently but both perform about the same), and Context Switching on ARM. I would guess the last one may have helped a few students with their assignments ;-) - it doesn't seem to be a topic of general interest.

Onward and upward

Like everyone else I'm pretty useless at predicting the future but I can probably take a rough guess at where my interests will take me in the next few years. I don't have a need for any particular software any-more (beyond what is a yum invocation away), so whatever I work on is only for entertainment (and perhaps a bit for education, but just solving problems for work educates one a great deal).

I think OpenCL will continue to grow - socles is already my most 'hit' google-code project and the only one anyone ever mailed me about (actually someone did mail me about puppybits). It isn't really going anywhere at the moment because I can't really think of anything to use it for myself - I have some vague ideas of a video-something application (mediaz/VideoZ), but there is so much to think about and code before it even gets started. As applications get bigger and more complex, that starting hump is quite a psychological barrier to get over when there are other sources of entertainment competing for my time. Back to socles though - OpenCL is still a bit of a niche, and Java + OpenCL even more-so, so I'm in no rush to expand it until I can find something to use it for.

As an aside, I've noticed a worrying trend on the OpenCL forums - which seems to be more afflicted by this than other forums, although I've seen it before elsehwere and it's probably just because i don't tend to hang around forums a lot these days. And that is this: inexperienced programmers - most likely students, with a very limited command of the English language, posting questions which demonstrate they can't even be bothered to read the manuals (OpenCL has some very good resources available). And even worse, to paraphrase a comment from the BeagleBoard list, the queries generally amount to to "I'm lazy, can you please do my homework for me?". Extremely rude and disrespectful and really messes up mailing lists and forums.

Puppybits ... well that will probably continue to stay on hold. Unless I take another big break between contracts again and have loads of time to work on it. Every now and then I have a look to see if there are any simple USB host stacks to snarf to help progress it, but nothing's popped up so far. Without USB one is severely constrained. If I ever get the OpenPandora I ordered that might pique my interest in ARM hacking again though. I have a big bunch of 'zedos' work I never committed which I probably should if only so it doesn't get lost from my backups (I `upgraded' my OS a few months ago and lost my development environment for example).

mediaz/ImageZ ... is probably of little use to anyone else, but I will keep poking away at it when I have the inclination. There are a few basic things I need to get sorted out before i'm prepared to drop a jar of it, which I will do at some point. One is the tool overlay mechanism which i'm refining again as I work on a crop-tool. Probably a couple of days work.

jjmpeg ... is already quite useful, although to package it up and polish it off would require a lot more work and time. This is one of those building blocks I needed for the video application I was thinking about, so now it's to some state of usefulness I can at least entertain the idea of moving forward with that. Also, if I decide to switch to it for some work code I have it would probably get a bit more of a work-out as well - it's something i'm considering since I can't get xuggle to build for windows (without more time than i'm willing to waste) and it's ffmpeg libraries are getting a bit out of date. Not to mention tied to 32 bits.

And i'll keep ranting about bits and pieces, cooking, gardening and other shit.

Thursday 11 August 2011

Video/Audio Player

I just checked in a reasonably complete audio/video player example using jjmpeg.


It synchronises the video to the audio if it's there, allows one to seek and pause and so on. The pause function is a bit crap - it keeps running any queued up data from the decoder - but that's only a fraction of a second. It uses a JLabel for output via a BufferedImage, which works well enough if the machine is quick. There are some other problems, but it works reasonably well all things considered. It's using JOAL for audio output.

The code is part of the jjmpegdemos sub-project, and is in the au.notzed.jjmpeg.mediaplayer namespace.

This is the one I mentioned I was working on 2 months ago, and since it was reasonably complete (and I don't think i'll be working on it again for a little while) I thought it was about time I checked it in. I have a swathe of stuff for socles I should probably upload at some point too.

matlabotomisation

matlabotomisation
- vb
To write or modify a matlab or octave script in order to achieve maximum efficiency in processing time. Thus rendering the algorithm virtually indecipherable to both mathematicians and software engineers alike.

Yes, i'm back to reading matlab scripts again - an unfortunately common task when dealing with research from computer scientists.

matlab (the language) is a really basic scripting language, with a library of routines that make processing mathematical algorithms possible, but not exactly easy. It isn't something that mirrors the mathematical language very concisely, nor maps easily to procedural languages. If that were it's only shortcoming it would be bad enough, but it is also really very slow.

So to get performance out of matlab one has to write code using (multi-dimensional) array types. Writing a loop which generates results one at a time is far too slow, so instead you generate a table of indices and then write a formulae that uses these indices to generate all results at once. This can be fairly concise, and it sort of sounds like functional programming or representing mathematics cleanly, but unfortunately it falls well short of this goal and often the code is off generating complex sets of indices which can be confused with it actually doing work. So you end up with something that might run reasonably quick (for matlab anyway), but is a real brain-ache trying to understand. It neither matches the mathematics, nor the processing steps the cpu takes to form the result.

I prefer when the scientist just gives up and writes simple matlab - for one, it makes my life a lot easier, and as a bonus even a trivial Java conversion will run at least an order of magnitude faster. So it makes me look smarter too!

Sunday 7 August 2011

OpenRaster, SPI, etc.

After poking around ImageZ a bit late last night I thought i'd tackle multi-layer reading/writing.

So I wrote a writer and eventually a saver for OpenRaster format. I decided on OpenRaster since it is so simple, and it was pretty much how I was going to write it anyway - only I was going to avoid the XML. Being a zip file makes things simple om Java too. It seems to interoperate well enough so far (since I only have 'normal' blend mode working anyway), although if you save layers in greyscale or 16 bit formats from ImageZ and then load/save them from MyPaint, everything is converted to RGBA 8 bit.

I still need a float format though - I started looking into OpenEXR last year - but that was about when I stopped working on ImageZ for a chunk of time too - but I hit some walls with the test images. I can't recall where the issue was now though. This isn't really a high priority.

Today I thought I'd work on writing an ImageReaderSpi for the format as well - for example since currently OpenRaster files do not display in the open requester. But i got too side-tracked trying to implement meta-data and other features which in hindsight I probably don't need. I might revisit it again later with reduced requirements and see if I can get it working.

Along the way I also played with JAXB XML (de)serialisation which looks pretty nice - as nice as things can get with XML I guess. In general I try to avoid XML as much as possible because I think it's the phlegm, vomit, and anal leakage of devil's spawn, so this was a pleasant surprise. No surprise that it wasn't originally an apache project though ...

Also started work on a crop tool. This is exposing me once again to issues with the tool overlays, so I should probably think about cleaning that up somehow too. I'm using piccolo2d at the moment, but the way I have the tools track the current zoom is a right pigs breakfast.

Saturday 6 August 2011

mediaz <-- ImageZ

I finally uploaded ImageZ to google code, under a new project mediaz. I'm pre-empting myself somewhat here, but i'm leaving room should I develop some other tools - e.g. if the VideoZ stuff ever goes anywhere.

I didn't get around to cleaning up everything I had intended to, so it's well short of being terribly useful, but that's the way it goes I guess. I didn't really want to spend my Saturday at the computer again, but there's not much else to do - everyone else is out and it's a crappy cold, windy and eventually wet day we're headed for.

Update: I had intended to catch up with a couple of mates for some beer and food this evening but I slept in and then it started pissing down with rain so I ended up stuck inside again (watching Port get totally arse-raped by Collingwood). Then I ended up playing with ImageZ a bit more and realised i'd sold it a bit short - there is quite a lot of functionality there after-all, even if some big and rather important parts are missing. I did a bit of hacking on it as well as some house-keeping on the google code page.

Friday 5 August 2011

Nvidia opencl 1.1

Yay, so NVidia finally released an opencl 1.1 spec driver. I guess now I should read up more on opencl 1.1 and see if there's anything I can take advantage of - so far It wasn't even on the radar because of their complete lack of support; and i'm happy enough with 1.0 anyway. I'm not sure this is really enough to restore confidence that OpenCL is a first-class citizen on NVidia hardware - their weekly emails haven't mentioned OpenCL for months. We're headed for AMD hardware anyway, if only to try alternatives.

Speaking of AMD, I thought I might try to create a Java binding for the AMD FFT library - I wouldn't mind evaluating it to see if it could replace my current FFT implementation (the apple one, as ported in the jocl demos tree). Unfortunately it uses some types and interfaces which are tricky to wrap in Java, at least in a way which works independent of the architecture's native size. So for now I might put it on the back-burner. (I looked at gluegen briefly but it had trouble parsing something - and the error messages it gives aren't much help).

Thursday 4 August 2011

Mailing Lists

I just set up some mailing lists for jjmpeg and socles.

I can't tell from google-code if there is much interest in the projects, but it seems a better idea to set up a mailing list than to receive direct emails about them.

These are still slow long-burn projects i'm working on when I feel inspired, and inspiration varies greatly from week to week.

Tuesday 2 August 2011

Bullies, liars, and arseholes.

Hmm, so my sister in law just got sacked from one of her cleaning jobs. After quite a bit of bullying from a fellow employee and what can only be considered racism/discrimination from upper management (e.g. complaining about her diminutive stature in her first week) they finally found enough of an excuse to fire her. A sham 'explain yourself' meeting that went on for hours, followed by a letter saying that she simply lied about everything in the meeting (which is simply not true).

Filthy liars.

I know hardly anyone reads my blog and even fewer locally, but for those, perhaps complain about the lack of cleanliness next time you're in the central market, or maybe just spit on the floor!

I probably wont bother ever going back there myself - not that I was a regular customer anyway.

Monday 1 August 2011

Playing with Web Start

After a lot of frobbing around I got a simple java webstart demo working for jjmpeg.

jjmpegdemos.jnlp

Assuming you have Java Web Start installed this should launch the application - and after a lot of 'this is untrusted' errors should end up with the application running. It lets you run very simple music player demonstration. It uses JOAL for the audio output and jjmpeg for the decoding.

You also need to have the ffmpeg shared libraries installed. A recent version. Which probably means this wont work with microsoft platforms yet - although I suppose if the ffmpeg librariesthat are available here: http://ffmpeg.zeranoe.com/builds/ are in the path it might work.

On GNU/Linux it will depend on compatible libavcodec/etc versions, I'm using using Fedora 13 and 14, and it worked fine on both with ffmpeg-libs from rpmfusion. I also tested x86 and amd64 platforms.

Anyway this is really just an experiment - I doubt it will work in general on every platform.