Debian Linux and LSI MegaRAID SAS

Debian Linux and LSI MegaRAID SAS

This HowTo show how to check the health of Hard Disks connected to a LSI Logic/Symbios Logic MegaRAID SAS 2108 RAID controller under linux. But is very useful for another hw raid controllers.

We look for its presence in the system:

~] lspci | grep RAID
01:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2108 [Liberator] (rev 03)

Bingo!, we can work with this one.

Install linux utilities

LSI provide megacli, a proprietary management command line utility. Debian repository containing all packages to install proprietary and opensource tools for you any HW RAID card can be found here.

My linux system is debian bullseye now. Add repository to /etc/apt/sources.list file in this format:

deb http://hwraid.le-vert.net/distrib branch main
  • distrib - can be either debian or ubuntu.
  • branch - can be squeeze, wheezy, jessie, stretch, buster or bullseye for debian, or precise, trusty, vivid, wily and xenial etc for ubuntu.

For my server it is:

deb http://hwraid.le-vert.net/debian bullseye main

Edit your /etc/apt/sources.list and add repository to last line:

/etc/apt/sources.list
deb http://hwraid.le-vert.net/debian bullseye main

Packages are now signed, please run the following command after adding the repository to sources.list:

wget -O - https://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add -

Make apt--update and install MegaCli utility and megaclisas-status script wrapper.

~] apt-get update
~] apt-get install megacli
~] apt-get install megaclisas-status

megacli

megacli is a proprietary tool by LSI which can perform both reporting and management for MegaRAID SAS cards. However it's really hard to use because it's use tones of command line parameters and there's no documentation.

Quickstart and output example

Get all adapters status and config:

~] megacli -AdpAllInfo -aAll
                                     
Adapter #0

==============================================================================
                    Versions
                ================
Product Name    : ServeRAID M5014 SAS/SATA Controller
Serial No       : SV01506370
FW Package Build: 12.15.0-0199

                    Mfg. Data
                ================
Mfg. Date       : 04/10/10
Rework Date     : 00/00/00
Revision No     : 
Battery FRU     : N/A

                Image Versions in Flash:
                ================
FW Version         : 2.130.403-3588
BIOS Version       : 3.30.02.2_4.16.08.00_0x06060A05
Preboot CLI Version: 04.04-020:#%00009
WebBIOS Version    : 6.0-53-e_49-Rel
NVDATA Version     : 2.09.03-0051
Boot Block Version : 2.02.00.00-0000
BOOT Version       : 09.250.01.219

[...]

Logical drive 0 on adapter 0 status and type

~] megacli -LDInfo -L0 -a0
                                     

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 135.972 GB
Sector Size         : 512
Mirror Data         : 135.972 GB
State               : Optimal
Strip Size          : 128 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Encryption Type     : None
Is VD Cached: No



Exit Code: 0x00

Show physical disks from first controller:

~] megacli -PDList -a0
                                     
Adapter #0

Enclosure Device ID: 252
Slot Number: 0
Enclosure position: N/A
Device Id: 13
WWN: 5000C50023595FFC
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 1
Last Predictive Failure Event Seq Number: 54338
PD Type: SAS
Hotspare Information: 
Type: Global, is revertible

Raw Size: 136.731 GB [0x11176d60 Sectors]
Non Coerced Size: 136.231 GB [0x11076d60 Sectors]
Coerced Size: 135.972 GB [0x10ff2000 Sectors]
Sector Size:  0
Firmware state: Hotspare, Spun Up
Device Firmware Level: B62C
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50023595ffd
SAS Address(1): 0x0
Connected Port Number: 2(path0) 
Inquiry Data: IBM-ESXSST9146852SS     B62C3TB19TDM0324B62C    
IBM FRU/CRU: 42D0668     
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :32C (89.60 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : Yes



Enclosure Device ID: 252
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 10
WWN: 5000C500235A1D08
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 136.731 GB [0x11176d60 Sectors]
Non Coerced Size: 136.231 GB [0x11076d60 Sectors]
Coerced Size: 135.972 GB [0x10ff2000 Sectors]
Sector Size:  0
Firmware state: Online, Spun Up
Commissioned Spare : No
Emergency Spare : No
Device Firmware Level: B62C
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c500235a1d09
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: IBM-ESXSST9146852SS     B62C3TB1H60G0324B62C    
IBM FRU/CRU: 42D0668     
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :35C (95.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 9
WWN: 5000C500235A08D8
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 136.731 GB [0x11176d60 Sectors]
Non Coerced Size: 136.231 GB [0x11076d60 Sectors]
Coerced Size: 135.972 GB [0x10ff2000 Sectors]
Sector Size:  0
Firmware state: Online, Spun Up
Commissioned Spare : No
Emergency Spare : No
Device Firmware Level: B62C
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c500235a08d9
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: IBM-ESXSST9146852SS     B62C3TB1H5J50324B62C    
IBM FRU/CRU: 42D0668     
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :31C (87.80 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No




Exit Code: 0x00

megaclisas-status

megaclisas-status is a wrapper script around megacli that report summarized RAID status with periodic checks feature. It is available in the packages repository too.

The packages comes with a python wrapper around megacli and an initscript that periodic run this wrapper to check status. It keeps a file with latest status and thus is able to detect RAID status changes and/or brokeness. It will log a ligne to syslog when something failed and will send you a mail. Until arrays are healthy again a reminder will be sent each 2 hours.

megaclisas-status output examples

Wrapper output example

~] megaclisas-status
-- Controller information --
-- ID | H/W Model                           | RAM    | Temp | BBU    | Firmware     
c0    | ServeRAID M5014 SAS/SATA Controller | 256MB  | N/A  | REPL   | FW: 12.15.0-0199 

-- Array information --
-- ID | Type   |    Size |  Strpsz | Flags | DskCache |   Status |  OS Path | CacheCade |InProgress   
c0u0  | RAID-1 |    136G |  128 KB | RA,WT | Disabled |  Optimal | /dev/sda | None      |None         

-- Disk information --
-- ID   | Type | Drive Model                              | Size     | Status          | Speed    | Temp | Slot ID  | LSI ID  
c0u0p0  | HDD  | IBM-ESXSST9146852SS B62C3TB1H60G0324B62C | 135.9 Gb | Online, Spun Up | 6.0Gb/s  | 35C  | [252:1]  | 10      
c0u0p1  | HDD  | IBM-ESXSST9146852SS B62C3TB1H5J50324B62C | 135.9 Gb | Online, Spun Up | 6.0Gb/s  | 32C  | [252:2]  | 9       

-- Unconfigured Disk information --
-- ID   | Type | Drive Model                              | Size     | Status              | Speed    | Temp | Slot ID  | LSI ID | Path    
c0uXpY  | HDD  | IBM-ESXSST9146852SS B62C3TB19TDM0324B62C | 135.9 Gb | Hotspare, Spun Up   | 6.0Gb/s  | 33C  | [252:0]  | 13     | N/A 

icinga2 integration

The script can be called with --nagios parameter. It will force a single line output and will return exit code 0 if all good, or 2 if at least one thing is wrong. It's the standard nagios expected return code.

~] megaclisas-status --nagios
RAID OK - Arrays: OK:1 Bad:0 - Disks: OK:3 Bad:0
~] echo $?
0
# find full path to megaclisas-status script
~] which megaclisas-status
/usr/sbin/megaclisas-status

# go to nagios plugins directory
~] cd /usr/lib/nagios/plugins/

# create symlink with name check_megaclisas_status
~] ln -s /usr/sbin/megaclisas-status check_megaclisas_status

run megaclisas-status as root

megaclisas-status must root privileges to run command. So, go to /etc/sudoers.d/ directory and create file monitoring with this contain:

/etc/sudoers.d/monitoring
Cmnd_Alias     CMD_MONITORING = /usr/lib/nagios/plugins/check_megaclisas_status, /usr/sbin/megaclisas-status

nagios         ALL=(ALL) NOPASSWD: CMD_MONITORING

Check that it works:

~] su - nagios
~] sudo /usr/sbin/megaclisas-status --nagios
RAID OK - Arrays: OK:1 Bad:0 - Disks: OK:3 Bad:0
~] sudo /usr/lib/nagios/plugins/check_megaclisas_status --nagios
RAID OK - Arrays: OK:1 Bad:0 - Disks: OK:3 Bad:0

create icinga2 check command definition

Create megaclisas_status.conf file in your icinga2 config directory with this content:

megaclisas_status.conf
object CheckCommand "megaclisas_status" {
    command = [ "sudo", "/usr/lib/nagios/plugins/check_megaclisas_status" ]

        arguments = {
                "--nagios" = {
                        required = true
                }
        }
}

create service config and add service to server

Go to icinga2 config dir, create file with service definiton and add service to server

megaraid.conf
object Service "megaraid" {

  host_name = "monitoring.secar.cz"
  check_command = "megaclisas_status"

  check_interval = 1m
  retry_interval = 30s
  max_check_attempts = 5
}
  • host_name = "monitoring.secar.cz" - monitoring.secar.cz is hostname where we add this service

restart icinga2 service

Check icinga2 configuration files integrity and reload config

~] icinga2 daemon -C
~] /etc/init.d/icinga2 restart

Sources: