Today I almost had my ass handed to me by a client, but thanks to Naslite 2 and it’s absolutely kick-ass daily mirror option, I walked out of their office with a smile and fully compensated.
On my advice, these guys elected to use Naslite as their primary server since it was a good fit for their operation. Soon after I set the primary server up, about a month back, I also installed a secondary Naslite 2 server that would act as a backup to the primary Naslite 2. Each server hosts 2 500G SATA drives that get mirrored nightly so all the data on the primary server is backed up on the secondary.
Well, early this morning I got a call from my client with a complaint that the server beeping and is very slow. It can’t be slow I thought, so I scheduled a visit within an hour. On arrival, I checked the syslog and to my surprise, disk-0 on the primary had generated 3500+ lines of syslog entries within the last 4 hours. Smart had also failed. These are new drives darn it, less than 2 months old. Needless to say, everyone in the office was pretty agitated and I did receive my share of potshots of nervous sarcasm.
Here is the beautiful part. 5 minutes after my arrival I announced that disk-0 is going bad and data on it is probably going bad with the drive. Then I proceeded to move disk-0 from the secondary to Disk-0 on the primary and vise versa. Normally the backup server does not share the drives so I had to share them as read only via SMB/CIFS and disable smart. 10 minutes after my arrival I announced that everyone could get back to work and get files worked on in the morning from the secondary server and move them back to the primary.
Talking about relief man. People lightened up and my name was no longer Mud. I stepped out to get a replacement 500G drive. Upon my return everyone seemed to have their blood pressure back to normal and their stuff off of the secondary so I replaced the bad drive and reset the mirror. I’ll call tomorrow and have them verify the mirror ran successfully.
I’m going to start installing trays in the servers so I can talk people through the process on the phone. Shut the servers down, switch the drives and turn the servers back on. That will keep downtime and loss of data to a minimum. It will also buy me some time so I don’t have to be on location at the drop of a hat.
All that good stuff from an itty bitty OS that costs little and absolutely kick ass.
