====== Guide to Managing a Dell PERC H710P RAID Controller ======

This guide provides instructions on how to check, manage, and maintain a Dell PERC H710P RAID controller on a Linux system (specifically Ubuntu/Debian).

===== Overview =====

The Dell PowerEdge RAID Controller (PERC) H710P is a hardware RAID controller that manages physical disks and presents them to the operating system as logical volumes, or "Virtual Drives". Proper management is critical for:
  * **Data Integrity:** Ensuring your data is not corrupted.
  * **Performance:** Optimizing read and write speeds.
  * **Redundancy:** Protecting against data loss from a single disk failure (depending on RAID level).

We will use the ''perccli'' command-line tool for all management tasks.

**NOTE:** Initial creation of a RAID array (Virtual Drive) is typically performed in the controller's configuration utility, accessible via a key press (e.g., Ctrl+R) during system boot. This guide focuses on management and verification from within the running operating system.

===== 1. Initial Setup: The `perccli` Tool =====

The official Dell tool, ''perccli'', is distributed as an ''.rpm'' package. On Debian-based systems like Ubuntu, you must first convert it to a ''.deb'' package using ''alien''.

**NOTE:**You will need to download the ''perccli'' tool from the [[https://www.dell.com/support/home/de-ch/drivers/driversdetails?driverid=wd0r5|Dell support website]].  Search for your server model or "PERC H710P" and find the appropriate download for Linux. The ''.rpm'' is usually within a packaged ''.tar.gz''.

First, install ''alien'' if you don't have it:
<code bash>
sudo apt-get update
sudo apt-get install alien
</code>

After downloading and unzipping the ''perccli'' package from Dell, navigate into its ''Linux'' subdirectory.

Convert the ''.rpm'' package to a ''.deb'' package:
<code bash>
sudo alien -k perccli-007.0127.0000.0000-1.noarch.rpm
</code>

Install the newly generated ''.deb'' package:
<code bash>
sudo dpkg -i perccli_007.0127.0000.0000-1_all.deb
</code>

The binary is installed in ''/opt/MegaRAID/perccli/''. To make it easier to use, create a symbolic link into a standard path:
<code bash>
sudo ln -s /opt/MegaRAID/perccli/perccli64 /usr/sbin/perccli64
</code>

You may need to refresh your shell session for the system to recognize the new command:
<code bash>
source ~/.bashrc
</code>

===== 2. Identifying the Controller =====

First, confirm the system sees the controller. The ''lsscsi'' command is useful for this.
<code bash>
$ lsscsi -g
[0:2:0:0]    disk    DELL     PERC H710P       3.13  /dev/sda   /dev/sg0 
[1:0:0:0]    cd/dvd  PLDS     DVD+-RW DS-8A8SH KD11  /dev/sr0   /dev/sg1 
[2:0:0:0]    disk    ATA      Samsung SSD 850  2B6Q  /dev/sdb   /dev/sg2 
</code>

Next, use ''perccli'' to get a system-wide overview. This will show you the controller's index number (usually 0). We will refer to this as ''/c0'' in all subsequent commands.

<code bash>
$ sudo perccli64 show all

Status Code = 0
Status = Success
Description = None

Number of Controllers = 1
Host Name = cmpt3
Operating System  = Linux5.15.0-153-generic

System Overview :
===============
----------------------------------------------------------------------------
Ctl Model            Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS EHS ASOs Hlth 
----------------------------------------------------------------------------
  0 PERCH710PAdapter     8   4   1     0   1     0 Opt On  3  N      0 Opt  
----------------------------------------------------------------------------
</code>

From the output above, we can see:
  * **Ctl (Controller Index):** ''0''. This is our ''/c0''.
  * **PDs (Physical Drives):** ''4'' drives are attached.
  * **VDs (Virtual Drives):** ''1'' virtual drive is configured.
  * **BBU (Battery Backup Unit):** ''Opt'' (Optimal). This is good!
  * **Hlth (Health):** ''Opt'' (Optimal). The overall controller health is good.

===== 3. Checking an Existing Setup =====

If you are inheriting a server or just want to check the health of an existing array, the main command is ''show''.

<code bash>
sudo perccli64 /c0 show
</code>

This command provides a lot of information. The most important sections are ''TOPOLOGY'' and ''VD LIST''.
<file>
[...]
TOPOLOGY :
========

---------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT     Size PDC  PI SED DS3  FSpace TR 
---------------------------------------------------------------------------
 0 -   -   -        -   RAID5 Optl  N  8.185 TB dflt N  N   dflt N      N  
 0 0   -   -        -   RAID5 Optl  N  8.185 TB dflt N  N   dflt N      N  
 0 0   0    :4      4   DRIVE Onln  Y  2.728 TB dflt N  N   dflt -      N  
 0 0   1    :5      5   DRIVE Onln  Y  2.728 TB dflt N  N   dflt -      N  
 0 0   2    :7      7   DRIVE Onln  Y  2.728 TB dflt N  N   dflt -      N  
 0 0   3    :6      6   DRIVE Onln  Y  2.728 TB dflt N  N   dflt -      N  
---------------------------------------------------------------------------
[...]
Virtual Drives = 1

VD LIST :
=======

-------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name 
-------------------------------------------------------------
0/0   RAID5 Optl  RW     Yes     RWBD  -   ON  8.185 TB      
-------------------------------------------------------------
</file>

Key things to check:
  * **DRIVE State:** In the ''TOPOLOGY'' section, all drives should be ''Onln'' (Online). Any other state (e.g., ''Failed'', ''Rbld'') requires attention.
  * **VD State:** In the ''VD LIST'', the state should be ''Optl'' (Optimal). A ''Dgrd'' (Degraded) state means a drive has failed and you are running without redundancy.
  * **Cache:** ''RWBD'' (Read Write-Back with BBU) gives the best performance. If this shows ''WT'' (Write-Through), it may indicate a problem with the BBU.
  * **sCC:** ''ON'' means scheduled consistency checks are enabled, which is good practice.

===== 4. Essential Maintenance: Automatic Health Checks =====

RAID controllers can proactively check for and fix issues. The two most important automated tasks are the **Patrol Read** and the **Consistency Check**.

  * **Patrol Read (PR):** Scans the physical disks for bad blocks and remaps them //before// they cause an error during a read operation. This prevents a disk from failing during a critical array rebuild.
  * **Consistency Check (CC):** Verifies the RAID parity data. It reads stripes and checks if the parity matches the data, correcting any errors it finds.

These tasks should be enabled and scheduled to run automatically during periods of low server activity (e.g., weekends, overnight).

==== 4.1. Checking Current Settings ====

Check the Patrol Read settings:
<code bash>
$ sudo perccli64 /c0 show pr
Controller = 0
Status = Success
Description = None

Controller Properties :
=====================
---------------------------------------------
Ctrl_Prop               Value                
---------------------------------------------
PR Mode                 Auto                 
PR Execution Delay      168 hours   
PR Next Start time      10/23/2025, 02:00:00 
[...]
</code>
Here, ''PR Mode`` is ''Auto'' and it runs every ``168 hours'' (1 week), which is excellent.

Check the Consistency Check settings:
<code bash>
$ sudo perccli64 /c0 show cc
Controller = 0
Status = Success
Description = None

Controller Properties :
=====================
-----------------------------------------------
Ctrl_Prop                 Value                
-----------------------------------------------
CC Operation Mode         Sequential           
CC Execution Delay        168                  
CC Next Starttime         10/15/2025, 02:00:00 
CC Current State          Stopped 
[...]
</code>
Here, ''CC Operation Mode'' is ''Sequential'' (meaning automatic) and it runs every ''168'' hours, which is also excellent.

==== 4.2. Enabling and Scheduling Checks (Best Practice) ====

If your checks are set to ''Manual'' or ''Disabled'', you **must** enable them. A good strategy is to run them weekly at different times.

**NOTE:** The ''starttime'' parameter sets the **next** time the task will run. The ''delay'' (e.g., 168 hours) makes it recur from that point onward.

**Example:** Schedule Consistency Check for every Saturday at 2 AM and Patrol Read for every Sunday at 2 AM.

To enable the automatic **Consistency Check**, set it to sequential mode. The command below sets it to run on a specific future date/time and repeat every 168 hours (7 days).
<code bash>
sudo perccli64 /c0 set cc=seq starttime=YYYY/MM/DD 02 delay=168
</code>

To enable the automatic **Patrol Read**, set its mode to auto. The command below sets it to run on a specific future date/time and repeat every 168 hours.
<code bash>
sudo perccli64 /c0 set patrolread mode=auto starttime=YYYY/MM/DD 02 delay=168
</code>

===== 5. Essential Maintenance: Battery Backup Unit (BBU) =====

The BBU protects the data in the controller's write-cache in case of a power failure. If the BBU is dead or failing, the controller will disable Write-Back cache (''WB''), severely degrading write performance.

Check the BBU status regularly:
<code bash>
sudo perccli64 /c0/bbu show all
</code>

Look for the ''State''. It should be ''Optimal''. The battery will periodically run a "Learn Cycle" to recalibrate its capacity, during which performance may be temporarily reduced. If the battery is marked as ''Failed'' or ''Degraded'', it needs to be replaced.