NASLite Network Attached Storage

www.serverelements.com
Task-specific simplicity with low hardware requirements.
It is currently Thu Apr 18, 2024 1:22 am

All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Tue Jun 15, 2010 3:19 pm 
Offline

Joined: Thu Jun 15, 2006 11:06 pm
Posts: 24
Location: New Orleans, LA
I've been running NASLite-2 on an Athlon XP 1400 box for several years with rock solid reliability.

About a month or so ago -- and with no changes to the hardware -- I began to get intermittent crashes that would leave the computer dead in the water (no response via any protocol), but still powered up and fans running, with the attached dump on the monitor display (I normally run headless, but I connected a monitor when these crashes started.)

The crashes have been occurring at intervals of a few days, but they don't seem to be connected with any particular activity on the file server. One of them occurred when I am certain there was no activity at all on the server, and I've been unable to trigger a crash by hitting the server with huge simultaneous bi-directional transfers.

When a crash occurs, power cycling and rebooting brings the system back up for another few days, and there has never been a problem when booting.

I've done the following without any improvement; not because I thought they had anything to do with the problem, but because they were quick and easy to do:
- Replaced the Netgear GA311 NIC with another new one (also a GA311).
- Upgraded NASLite-2 from v2.60 to v2.62.

Can anyone make any sense out of the display dump? Does it offer any hints as to where to start looking?

I acknowledge that this is probably a hardware issue of some sort (motherboard, processor, memory). I'm also familiar with PC hardware, so I pretty much know what I'm going to need to do to track down a hardware failure. But I'm not so familiar with Linux, so I thought before I started hardware troubleshooting from scratch I'd see if anyone could make sense out of the display dump.

Thanks,
Jim


Attachments:
File comment: Monitor display after kernel panic.
NASLite 033 (Custom).JPG
NASLite 033 (Custom).JPG [ 108.83 KiB | Viewed 18670 times ]
Top
 Profile  
 
PostPosted: Tue Jun 15, 2010 10:23 pm 
Offline
Site Admin

Joined: Tue Jul 13, 2004 4:11 pm
Posts: 1771
Location: Server Elements
Problem is caused by the NIC from the looks of your screen. Try moving the NIC to a different slot or set the PNP-OS option in your BIOS to OFF, reboot and then back to ON and reboot again. Hopefully the NIC will be able to get a clear IRQ.


Top
 Profile  
 
PostPosted: Tue Jun 15, 2010 11:43 pm 
Offline

Joined: Thu Jun 15, 2006 11:06 pm
Posts: 24
Location: New Orleans, LA
Tony wrote:
Problem is caused by the NIC from the looks of your screen. Try moving the NIC to a different slot or set the PNP-OS option in your BIOS to OFF, reboot and then back to ON and reboot again. Hopefully the NIC will be able to get a clear IRQ.


OK, thanks Tony, I'll give that a try. I did replace the NIC, but I put the new one back in the same slot.

The odd thing is that this configuration has been so stable for so long.

I did miss one thing in my first message though: A couple of months ago I had a hard drive fail in the server, but I haven't yet replaced it. Instead, I just unplugged it from the IDE cable. So I guess MAYBE I could have introduced an IRQ conflict that wasn't there before just by removal of that hard drive.

I haven't dealt with IRQ conflict issues since Windows 98 days....

Jim


Top
 Profile  
 
PostPosted: Wed Jun 16, 2010 12:23 am 
Offline

Joined: Thu Jun 15, 2006 11:06 pm
Posts: 24
Location: New Orleans, LA
Tony wrote:
Problem is caused by the NIC from the looks of your screen. Try moving the NIC to a different slot or set the PNP-OS option in your BIOS to OFF, reboot and then back to ON and reboot again. Hopefully the NIC will be able to get a clear IRQ.


Tony,

I just re-read your post again, and I'd like to clarify: Should the BIOS be set so that the PnP OS setting is ON/YES? Your comment seemed to imply that it should be ON since you suggest turning it OFF, rebooting, and then turning it back ON and rebooting again. I ask because I've had the BIOS PnP OS setting on OFF/NO for years, and I'm now wondering if that's why my latest HDD removal introduced what looks like an IRQ conflict.

I now have the BIOS PnP OS setting re-set to ON/YES, so unless you suggest otherwise I'm going to give that a try for a few days and see what happens.

Jim


Top
 Profile  
 
PostPosted: Wed Jun 16, 2010 11:37 pm 
Offline
Site Admin

Joined: Tue Jul 13, 2004 4:11 pm
Posts: 1771
Location: Server Elements
Moving to a different slot will usually be enough. Ideally, you'll want PNP OS set to ON, but toggling it and rebooting forces some BIOS to reallocate IRQs. Doesn't always work, but worth a try.


Top
 Profile  
 
PostPosted: Fri Sep 10, 2010 6:29 pm 
Offline

Joined: Thu Jun 15, 2006 11:06 pm
Posts: 24
Location: New Orleans, LA
This is an old thread that I started, but I thought it might be helpful to let you know that I found a solution to this problem.

But before I ultimately discovered the issue, I had replaced the ENTIRE server hardware -- NIC, memory, motherboard, processor, power supply, and graphics card (everything but the hard drives and case) and I STILL was getting these intermittent NASLite crashes. I moved the NIC around to different slots and played with BIOS settings, to no avail.

The problem turned out to be a compatibility issue between my D-Link network switches and my Netgear GA-311 NIC card. Prior to the time that this problem started, I was using Netgear GS60x series 1GB switches. But I didn't like the Netgear switches, since they seemed to have difficulty holding 1GB speeds even on Cat 5E. So I switched out the Netgear switches for D-Link DGS-220x switches, which I like much better (solid 1GB throughout). Apparently the D-Link switches and the Netgear GA-311 NIC didn't play nicely together.

I replaced the GA-311 NIC with a D-Link DGE-530T NIC and the problem went away. My NASLite server has now been running rock solid for about two months. During that time I have had ZERO packet errors, whereas before I was getting quite a few.

I'm posting this primarily to help others that might run into similar problems.

I'm also somewhat disappointed that NASLite's error handling couldn't have provided more graceful handling of this. I would have expected network or packet errors, but not kernel panics. But I'm sure this has more to do with the Linux kernel than with NASLite itself.


Top
 Profile  
 
PostPosted: Fri Sep 10, 2010 9:04 pm 
Offline

Joined: Sun Apr 02, 2006 9:05 pm
Posts: 1688
Location: Up State NY in the USA!!!!
I would be more pissed at Realtek than Linux or NL. They are likely the chips at the core of the issues you are having. If they were fully compliant then you would not have to worry about the NIC or the switch you were pairing up.

Yet another reason to use enterprise level managed switches and NICs. The performance is better and the reliability and compatibility are far better.

Glad you found your way through the issue and thanks for the heads up.

Mike


Top
 Profile  
 
PostPosted: Sat Sep 11, 2010 10:01 am 
Offline

Joined: Sat Nov 19, 2005 6:39 pm
Posts: 633
Location: California
Thanks for that very useful "solutions" update :)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 39 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group