====== Guide to Managing an HP Smart Array P420i/P420p RAID Controller ======
This guide provides instructions on how to check, manage, and maintain an HP Smart Array P420i/P420p RAID controller on a Linux system (specifically Ubuntu/Debian).
===== Overview =====
The HP Smart Array P420i (integrated) and P420p (PCIe card) are hardware RAID controllers that manage physical disks and present them to the operating system as "Logical Drives". Proper management is critical for:
* **Data Integrity:** Ensuring your data is not corrupted.
* **Performance:** Optimizing read and write speeds.
* **Redundancy:** Protecting against data loss from a single disk failure (depending on RAID level).
We will use the ''ssacli'' command-line tool for all management tasks.
**NOTE:** Initial creation of a RAID array (Logical Drive) is typically performed in the controller's configuration utility (Smart Storage Administrator), accessible via a key press (e.g., F10 for Intelligent Provisioning, or F8 during boot) on system startup. This guide focuses on management and verification from within the running operating system.
===== 1. Initial Setup: The `ssacli` Tool =====
You can follow [[https://gist.github.com/mrpeardotnet/a9ce41da99936c0175600f484fa20d03|this guide]] to add the repo to apt in Ubuntu.
===== 2. Identifying the Controller =====
First, confirm the system sees the controller. The ''lsscsi'' command is useful for this.
$ lsscsi -g
[4:0:0:0] storage HP P420 4.68 - /dev/sg0
[4:1:0:0] disk HP LOGICAL VOLUME 4.68 /dev/sda /dev/sg1
[5:0:0:0] storage HP P420i 8.32 - /dev/sg2
[5:1:0:0] disk HP LOGICAL VOLUME 8.32 /dev/sdb /dev/sg3
[6:0:0:0] storage HP P420 8.00 - /dev/sg4
If you are unsure of which device corresponds to which controller, you can cross-check with the logical volumes:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 8.2T 0 disk
sdb 8:16 0 953.8G 0 disk
├─sdb1 8:17 0 1M 0 part
├─sdb2 8:18 0 2G 0 part /boot
└─sdb3 8:19 0 951.8G 0 part
├─ubuntu--vg-ubuntu--lv 252:0 0 100G 0 lvm /
└─ubuntu--vg-lv--swap 252:1 0 32G 0 lvm [SWAP]
Next, use ''ssacli'' to get a system-wide overview. This will show you the controller's slot number. We will refer to this as ''slot=X'' in all subsequent commands (it is very often ''slot=0'').
$ sudo ssacli ctrl all show status
Smart Array P420i in Slot 0 (Embedded)
Controller Status: OK
Cache Status: OK
Battery/Capacitor Status: OK
From the output above, we can see:
* **Controller:** ''Smart Array P420i in Slot 0''. This is our controller.
* **Controller Status:** ''OK''. The controller hardware is healthy.
* **Cache Status:** ''OK''. The write cache is operational.
* **Battery/Capacitor Status:** ''OK''. The Flash-Backed Write Cache (FBWC) power source is healthy.
===== 3. Checking an Existing Setup =====
If you are inheriting a server or just want to check the health of an existing array, the main command is ''show config detail''.
sudo ssacli ctrl slot=0 show config detail
This command provides a lot of information. The most important sections are the main controller block, the ''logicaldrive'' blocks, and the ''physicaldrive'' blocks.
Smart Array P420i in Slot 0 (Embedded)
Bus Interface: PCI
[...]
Controller Status: OK
Cache Status: OK
Battery/Capacitor Status: OK
Array A (SAS, Unused Space: 0 MB)
logicaldrive 1 (931.5 GB, RAID 5, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
Key things to check:
* **Controller/Cache/Capacitor Status:** These should all be ''OK''. Any other state requires investigation.
* **logicaldrive Status:** The status for each logical drive should be ''OK''. A ''Degraded'' state means a drive has failed and you are running without full redundancy.
* **physicaldrive Status:** All physical drives should be ''OK''. Any other state (e.g., ''Failed'', ''Predictive Failure'') requires attention. The drive should be replaced.
===== 4. Essential Maintenance: Automatic Health Checks =====
HP Smart Array controllers proactively check for and fix issues. The two most important automated tasks are the **Surface Scan** and the **Parity Scan (Consistency Check)**.
* **Surface Scan:** Scans the physical disks for bad blocks (media errors) and remaps them //before// they cause an error during a read operation. This is HP's equivalent of a Patrol Read and is crucial for preventing a disk from failing during a critical array rebuild.
* **Parity Scan (Consistency Check):** Verifies the RAID parity data. It reads stripes and checks if the parity matches the data, correcting any errors it finds.
==== 4.1. Checking Current Settings ====
All relevant health check settings are displayed in the detailed configuration output.
sudo ssacli ctrl slot=0 show config detail
In the output for the controller, look for the following line:
Surface Scan Delay: 15 secs
Surface Scan Mode: Idle
[...]
* **Surface Scan Mode** should be ''Idle'', meaning it runs when the controller is not busy.
* **Surface Scan Delay** is how long the controller waits for I/O to be idle before starting a scan. The default is fine.
The consistency check is not a separately scheduled task in the same way as other vendor's tools. It runs automatically in the background with low priority. You can verify its status is complete for a given logical drive in the same output:
logicaldrive 1 (931.5 GB, RAID 5, OK)
[...]
Parity Initialization Status: Initialization Completed
[...]
==== 4.2. Modifying Health Check Behavior (Best Practice) ====
By default, HP controllers have sane settings for these checks. You typically do not need to schedule them. However, you can modify their behavior.
**Surface Scan**
To ensure the surface scan runs with a specific priority (e.g., high) or at a specific time, you would have to script it manually. In general, the default "Idle" mode is sufficient for most use cases, as it automatically performs the scan during periods of low I/O.
**Consistency Check (Parity Scan)**
While this is an automatic background process, you can modify its priority or trigger it manually. For example, if you suspect an issue, you can start a check on logical drive 1 in slot 0:
sudo ssacli ctrl slot=0 logicaldrive 1 startconsistencycheck
You can also adjust the priority of the background consistency check. Setting it higher will complete the check faster but may have a greater performance impact on the server.
sudo ssacli ctrl slot=0 modify consistencycheckpriority=medium
(Options are ''low'', ''medium'', ''high''. The default is ''low'').
===== 5. Essential Maintenance: Flash-Backed Write Cache (FBWC) =====
The P420i/p controller uses a Flash-Backed Write Cache (FBWC) module, which is powered by a super-capacitor, not a battery (BBU). This module protects the data in the controller's write-cache in case of a power failure. If the capacitor is dead or failing, the controller will disable the ''Write Cache'', severely degrading write performance.
Check the cache and capacitor status regularly with the ''show status'' or ''show detail'' command.
sudo ssacli ctrl slot=0 show detail
Look for the ''Cache Status'' and ''Capacitor Status'' lines.
Smart Array P420i in Slot 0 (Embedded)
[...]
Controller Status: OK
Cache Status: OK
Cache Status Details: The cache is configured.
[...]
Capacitor Status: OK
Both values must be ''OK''. A ''Failed'' or ''Degraded'' status indicates the cache module or its capacitor needs to be replaced.