Page 1 of 2

Emergency Maintenence Window - Tue Jun 20 20:30:00 UTC 2006

Posted: Tue Jun 20, 2006 8:14 pm
by Leaf
The Metalforge server will be offline for an estimated 2-4 hours for hardware testing (memtest)

Down time will start at: Tue Jun 20 20:30:00 UTC 2006

Posted: Tue Jun 20, 2006 8:34 pm
by Leaf
memtest process/test is running now.

Further details will be posted as/when available.

Posted: Wed Jun 21, 2006 12:00 am
by Leaf
After almost 3.5 hours of testing, memtest has not found any errors.

I'm going to continue to let it run for another 30mins.

Posted: Wed Jun 21, 2006 12:41 am
by Leaf
Three (3) scans with memtest all came back clean.

Going to try a full backup of the server now.

Posted: Wed Jun 21, 2006 12:59 am
by Leaf
A full backup was completed without any problems.

No errors or warnings during the backup process either.

The server is back online and ready for play.

Thank you for your patience during the outage.

Posted: Wed Jun 21, 2006 7:23 am
by Ryo
And thank you for the nice server handling :)

Posted: Wed Jun 21, 2006 7:50 am
by Mith
Are there any stability issues?
The only problem i experience when playing are lag pikes, but they can be caused by a lot of things so it's nowhere reasonable to blame metalforge ;-)

Posted: Wed Jun 21, 2006 4:13 pm
by Leaf
Mith wrote:Are there any stability issues?
Maybe 2 or 3 times a year Crossfire becomes a run-away process that I can correct if I catch it in time.

Otherwise, all other reboots have been hardware upgrade/replacement related. I don't know how many months (years?) it's been since there was a power issue.

Posted: Wed Jun 21, 2006 6:30 pm
by Rednaxela
Leaf wrote:Maybe 2 or 3 times a year Crossfire becomes a run-away process that I can correct if I catch it in time.
What do you mean by a run-away process? As in an endless loop? If so would it be possible to try this when it happens:
It is also possibel to attach gdb to a already running process. This can be done with "gdb executable pid". gdb will attach to the process specified by the process id. That might be useful it some application seems to be stuck in a endless loop and you want to figure where exactly it hangs.
(quoted from here)
That procedure would allow us to get an idea where it's hanging while not having to do anything special while initially starting up the server, and it shouldn't increase the time till it's back up by any significant amount.

Posted: Wed Jun 21, 2006 6:38 pm
by Leaf
Rednaxela wrote:
Leaf wrote:Maybe 2 or 3 times a year Crossfire becomes a run-away process that I can correct if I catch it in time.
What do you mean by a run-away process? As in an endless loop? .
Crossfire ends up using 99.9% of CPU while trying to do things like load a randomly generated map or some other task according to the server logs.

If I catch it in time, I'm able to get a core dump which thus far has resulted in an updated or patch.

If I don't catch it, then a system reboot is necessary as the server is completely unresponsive.