NASLite Network Attached Storage

www.serverelements.com
Task-specific simplicity with low hardware requirements.
It is currently Fri Apr 19, 2024 9:18 pm

All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Degraded Array?
PostPosted: Sat Sep 09, 2006 11:22 am 
Offline

Joined: Fri Jun 16, 2006 5:09 am
Posts: 130
Just noticed the following messages in my syslog. :oops:
I suspect that I have a failed disk in the RAID 5 array. :shock:
Is this the only message one should expect if a disk goes bad?
My raid card, a 3ware 7500-8 doesnt have an onboard speaker so this is a bit incospicuous.


Aug 30 21:14:39 user.warn kernel: 3ware Storage Controller device driver for Linux v1.02.00.037.
Aug 30 21:14:39 user.warn kernel: 3w-xxxx: AEN: ERROR: Unit degraded: Unit #0.
Aug 30 21:14:39 user.notice kernel: scsi1 : Found a 3ware Storage Controller at 0xbc00, IRQ: 3, P-chip: 1.3
Aug 30 21:14:39 user.info kernel: scsi1 : 3ware Storage Controller
Aug 30 21:14:39 user.warn kernel: Vendor: 3ware Model: Logical Disk 0 Rev: 1.0
Aug 30 21:14:39 user.warn kernel: Type: Direct-Access ANSI SCSI revision: 00
Aug 30 21:14:39 user.debug kernel: libata version 1.11 loaded.
Aug 30 21:14:39 user.warn kernel: Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
Aug 30 21:14:39 user.warn kernel: SCSI device sda: 4102491904 512-byte hdwr sectors (2100476 MB)
Aug 30 21:14:39 user.info kernel: /dev/scsi/host1/bus0/target0/lun0: p1


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 09, 2006 12:18 pm 
Offline

Joined: Sun Jul 09, 2006 10:26 am
Posts: 428
Location: UK
take a look here

https://twiki.cern.ch/twiki/bin/view/FIOgroup/DiskPrbTw


Top
 Profile  
 
 Post subject:
PostPosted: Sun Sep 10, 2006 10:26 am 
Offline

Joined: Mon Jan 23, 2006 11:22 am
Posts: 144
Based on the "unit degraded" message, it would appear that a drive has failed - a "unit impaired" message would also have the same significance.

Your dilema now, is to determine which physical drive has failed - if you can do this accurately, you should be able to replace it and have the array rebuild successfully - the syslog messages do point to unit 0, but I suspect that is the logical drive and not the physical drive, since it is being reported as a 2 TB drive.

Most RAID controller manufacturers provide management utilities which can be used to identify the failing physical disk, however, it does not appear that NASLIte-2 includes these tools or makes provision for you to install them - you may have to temporarily connect the array to a different system on which you can install and run the utility.

Edit - You could try rebooting the system and looking at the BIOS messages - they may indicate which drive has failed, but more through the absence of a message rather than an actual error message - for example - if the BIOS only lists three drives detected instead of four, the inference would be that the missing drive is the failed one.

Backup your data whilst you still have access to it, because if you replace the wrong drive, you will, depending on the RAID configuration, lose the entire logical drive.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 11, 2006 3:23 am 
Offline

Joined: Fri Jun 16, 2006 5:09 am
Posts: 130
I did a reboot but the message still remains so it seems that I do have a failed disk. I have the system off now until I get a chance to connect a monitor to it and try, via the BIOS util, to ascertain which disk it is...

The link kindly posted by gaiden gave some clues and it seems the newer 9xxx series cards include the port number in the error message but not the 7/8xxx series.

Maybe Tony and Ralph could consider including an audible alert based on such messages in the syslog.

The alternative is to write a client side script that parses the syslog html and generates an alert if such messages are detected.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 11, 2006 7:47 am 
Offline

Joined: Mon Jan 23, 2006 11:22 am
Posts: 144
Perhaps SNMP or emailed alerts could be used - the way I understood it was that the capacity of a diskette was once the limiting factor as to what features could be included, now that we can boot from other medium, perhaps these can be considered.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 18, 2006 1:24 am 
Offline

Joined: Fri Jun 16, 2006 5:09 am
Posts: 130
Just an update.
The failed drive (Unit 3) is now replaced and the array is rebuilt.

The initial syslog entry was:

Sep 17 10:13:20 user.warn kernel: 3w-xxxx: scsi1: AEN: INFO: Rebuild started: Unit #0.

followed by

Sep 17 12:37:08 user.warn kernel: 3w-xxxx: scsi1: AEN: INFO: Rebuild complete: Unit #0.

Looks like the RAID controller just repayed a part of its price.

I wish there was an audible alarm though when the array went to degraded status.....

I ran for a good 3 weeks with a degraded array without knowing it.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 18, 2006 2:35 am 
Offline

Joined: Sun Apr 02, 2006 9:05 pm
Posts: 1688
Location: Up State NY in the USA!!!!
A good reason to check your logs frequently.

Mike


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 18, 2006 8:17 am 
Offline

Joined: Fri Jun 16, 2006 5:09 am
Posts: 130
From what I remember, the first instance of a log entry was at boot time. I am not certain that a drive failure, causing a degraded array, would result in an immediate log msg appearing in the NASlite syslog page.

If this is the case, then given that a NASbox is not supposed to booted frequently, it could still be a problem.

Maybe Tony or Ralph could shed some light on this.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 18, 2006 8:54 am 
Offline

Joined: Mon Jan 23, 2006 11:22 am
Posts: 144
In my experience you will not receive notification of a drive failure if the RAID management utilities are not running, which I believe is the case with NASLite.

The following is a quote from the 3Ware Escalade User Guide so it would appear that audible alerts are provided by the management utilities.

Quote:
3ware’s 3DM™, a web-based storage management utility, sends notification of drive failures via email and audible alerts, providing the system administrator with local and remote asynchronous event reporting of array activities.


It should be noted that audible alerts are only useful if the equipment is in earshot of the operator, which is not always the case with servers, imho, e-mail or SNMP alerts are much more desirable.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 18, 2006 3:08 pm 
Offline
Site Admin

Joined: Tue Jul 13, 2004 4:11 pm
Posts: 1771
Location: Server Elements
I think that adding overhead processes to the existing set is probably not a wise thing to do, especially considering the nature of NASLite. The need for monitoring however is obvious, so I think a good approach may be a simple client-side utility that can be configured to look for patterns in the syslog or other pages (SMART, etc.) and issue an event (alarm, mail, etc.) on a match.

The point is that the monitor can run on the client machine and not the server.

Just a thought...


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 19, 2006 3:58 am 
Offline

Joined: Fri Jun 16, 2006 5:09 am
Posts: 130
Tony, I can understand this point of view and I agree mostly. My initial question was whether the NASlite syslog would show such info and I am not talking about the initial boot messages.

In my case, I can confirm that after I started the rebuild from the BIOS util and booted NASlite, the syslog showed that the rebuild started. A couple of hours later it logged that it completed.

What I am not sure about is if the same behaviour can be expected if the RAID controller reports a degraded array. Would a msg appear in the syslog or would it appear the next time the OS boots.

I guess I could try disconnecting a drive on the fly to confirm this, but..... if I was to do it by the book I would have to backup the contents of my array to other media then try this....... a bit impractical.... :lol:


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 19, 2006 8:54 am 
Offline
Site Admin

Joined: Tue Jul 13, 2004 4:11 pm
Posts: 1771
Location: Server Elements
ALucas wrote:
Tony, I can understand this point of view and I agree mostly. My initial question was whether the NASlite syslog would show such info and I am not talking about the initial boot messages.

In my case, I can confirm that after I started the rebuild from the BIOS util and booted NASlite, the syslog showed that the rebuild started. A couple of hours later it logged that it completed.

What I am not sure about is if the same behaviour can be expected if the RAID controller reports a degraded array. Would a msg appear in the syslog or would it appear the next time the OS boots.

I guess I could try disconnecting a drive on the fly to confirm this, but..... if I was to do it by the book I would have to backup the contents of my array to other media then try this....... a bit impractical.... :lol:


Yes, the syslog will show kernel messages as they occur (delayed by the status update frequency obviously), so if there is a hardware problem that occurs during normal operation and is detected by the kernel, the syslog will reflect that. Keep in mind however that many of the messages are driver-specific, so some may be very explicit while others may be very vague. In general however, anomalies during operation will be shown in the syslog.

Apologies for the confusion ;-)


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 23, 2006 3:39 am 
Offline

Joined: Sat Sep 23, 2006 2:38 am
Posts: 2
Maybe a dumb question, but those RAID adapters that have kernel drivers, don't they put any kind of status in the /proc filesystem?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 05, 2006 5:08 pm 
Offline

Joined: Thu Nov 30, 2006 8:25 pm
Posts: 10
Location: NY
How did you figure out which device failed? I am in the process of buying a 3Ware card and it is a concern that there is no way to tell what failed. I'm curious if something in the messages gave you the clue?

Thanks.

ALucas wrote:
Just an update.
The failed drive (Unit 3) is now replaced and the array is rebuilt.

The initial syslog entry was:

Sep 17 10:13:20 user.warn kernel: 3w-xxxx: scsi1: AEN: INFO: Rebuild started: Unit #0.

followed by

Sep 17 12:37:08 user.warn kernel: 3w-xxxx: scsi1: AEN: INFO: Rebuild complete: Unit #0.

Looks like the RAID controller just repayed a part of its price.

I wish there was an audible alarm though when the array went to degraded status.....

I ran for a good 3 weeks with a degraded array without knowing it.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Dec 13, 2006 10:20 am 
Offline

Joined: Fri Jun 16, 2006 5:09 am
Posts: 130
I restarted the server and using ALT-3 during the BIOS sequence I went into the 3ware BIOS utility which showed which disk was dropped from the array.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 45 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group