Saturday 21 November 2015

bloody peecee

I've been having a few issues with my machine lately so as i had nothing better to do yesterday I had a look into fixing it.

First issue I looked at was that I broke OpenGL when i upgraded to the latest catalyst driver. I tried various kernels, even a long-overdue update of slackware (which lead to a another few hour's diversion earlier in the month trying to un-fuckup firefox as much as i could), using the slackware kernel, etc. But finally I used strace and slackpkg to determine it was using a stale .so left over from AMD's previous abomination of an install script so once that was deleted all was well. SVM still wont work in any of my code - which is still baffling as it works from the AMD SDK samples. I should really post something on the AMD forums but i haven't used it for a while.

The other problem has been only half my RAM started showing up a few months ago; the bios shows 8G but linux only sees 4G. This is ok for what i normally use the machine for but netbeans brings it to it's knees if you have a couple of projects open. I thought it was down to a faulty DIMM and If i hadn't forgotten and needing to come home earlier yesterday I might have bought some. But I tried each individually and they seemed to work - passing the linux make test at least. I tried lower speeds, voltages, all sorts of things. It appears a fairly common problem but nobody had much of a solution.

I even upgraded the bios - which was another couple of hour diversion as that cleared the EFI boot record and I had to re-figure out how to re-install it. I used a USB-bootable tool-linux distribution called grml - very impressed - I just dd'd it onto the usb stick, and it even provides an EFI boot record so i could just use efibootmgr to re-add it. I know i've tried puppylinux multiple times before but it never worked so I might get a usb stick just to keep this handy.

The last thing I tried was dusting the sockets and then trying to jiggle the cpu in it's socket - I don't have any thermal paste so I couldn't get at it properly but i wiggled a bit without freeing the heatsink.

Now ... that appeared to work. I did some tests and so on and it seemed to be ok, ... I even wrote a post about it.

But then I had to tinker, so i was playing with the memory speeds - to see if "faster" ram really made any real difference - but I broke the whole machine so i had to power cycle it a few times till the BIOS reset everything, ... and yeah back to 4G of ram. Blast. I swore at it, shut it off, and went to bed early.

On a cold boot I did get 8G out of it but then the system crashed while I was using it, so something is amiss. It could still be the RAM I suppose but at this point i'm more inclined to believe it is the motherboard or PSU. I don't feel like investigating further for the moment and will just see how it goes (i just cold booted and it's 8G again *shrug*).

However, some good did come of all this.

When I was playing with kernels I went to the trouble to actually go through all the config options and fuck off everything I didn't need. It's down to 3.2M packed including my system filesystem with no module so i don't need an initrd, and it boots a bit faster (not that it was any slouch). I'm also trying the fully-preemptive kernel and i'm liking it so far, even under very heavy load the system remains interactive such that you barely notice at all. Hmm, I just noticed the sound mixer is missing, NM.

But the really good bit is now the CPU runs almost cold - i don't know if it's the BIOS upgrade, the kernel customisation, or some BIOS setting that changed (i already had it on 45W TDP because it got too hot), but the difference is marked. Previously any high-load task such as a 'make -j4' would cause the fan to kick in almost immediately and if i didn't also up the case fan speed (manually) fairly promptly the whole desk would take off. I've currently got a kernel build running for a few minutes triggering a load-average of over 8, with the case fan on it's slowest setting; and the CPU fan hasn't throttled up at all.

Maybe the machine is just running a lot slower now - but I can't really tell so what does it matter if it is.

Update: it got unstable so i took one dimm out. Seems ok so far. Maybe it's just the PSU? It seems like the most fragile component and it's almost certainly underpowered anyway (sigh, a poor decision that one). I dunno. I started looking at new cases & psus but if it remains stable albeit in a reduced capacity I'll be in no rush ...

No comments: