The acronym SMART stands for Self-Monitoring, Analysis and Reporting Technology and is a monitoring system built into most modern storage devices. The package smartmontools includes the utilities smartctl and smartd, which process SMART data to ‘provide advanced warning of disk degradation and failure‘.
Step 1
Start by configuring nullmailer to receive status updates from your system.
Step 2
Install smartmontools and update its drive database to the latest version.
$ sudo apt-get install --yes smartmontools smart-notifier && sudo update-smart-drivedb
Step 3
Continue by obtaining relevant information about available storage devices.
$ sudo smartctl --scan
Depending on the type of disk, you should see a block of information similar to the following.
/dev/sda -d scsi # /dev/sda, SCSI device
Step 4
Enable SMART support for and display detailed information about the device.
$ sudo smartctl -iHs on /dev/sda
Ideally, information about the device to be monitored would be found in the drive database.
=== START OF INFORMATION SECTION === Device is: In smartctl database [for details use: -P show]
The device should report a successful self-assessment test.
=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
Please note: the drive database does not extend to NVMe devices. SMART support for NVMe devices is curently limited to a subset of features.
Step 5
Verify the SMART capabilities of the device.
$ sudo smartctl -c /dev/sda
The following output confirms that the device /dev/sda
has both short and extended self-test capabilites.
=== START OF READ SMART DATA SECTION === SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 85) minutes.
The output provides estimates for the duration of short and extended (long) self-test routines.
If the device is capable of self-tests
Use the following command to run a short self-test.
$ sudo smartctl -t short /dev/sda
Use the following command to run a long self-test.
$ sudo smartctl -t long /dev/sda
Display a list with the results of recent self-tests in reverse chronological order.
$ sudo smartctl -l selftest /dev/sda
All tests should have completed without error.
Step 6
Edit the default configuration /etc/smartd.conf
and comment out any DEVICESCAN
options, thus preventing smartd from attempting to search for attached devices indiscriminately.
On Debian 12, you can use the following command to comment out the DEVICESCAN
option in the default configuration file.
$ sudo sed -i 's/DEVICESCAN -d removable -n standby -m root -M exec/#DEVICESCAN -d removable -n standby -m root -M exec/' /etc/smartd.conf
Step 7
Example configuration for smartd and SATA devices
For the device /dev/sda
, the following configuration for monitoring the device with smartd would have to be added to /etc/smartd.conf
.
/dev/sda -H -l error -l selftest -S on -s (L/../.././06|S/../.././18) -m root -M test
-H
display the health status as reported by the device
-l error
show the increase in the number of SMART errors since last check
-l selftest
show the increase in the number of failed tests in the SMART Self-Test Log
-S on
enable Attribute Autosave on startup
-s (L/../.././06|S/../.././18)
schedule a long self-test between 06:00 and 07:00 daily and a short self-test between 18:00 and 19:00 daily
-m root
local user root receives warning by email
-M test
send a test email on startup
Example configuration for smartd and NVMe devices
Current versions of smartmontools offer experimental support for NVMe devices. In practice this means that only a limited, but still useful, feature set is available.
For the device /dev/nvme0
, the following configuration for monitoring the device with smartd would have to be added to the end of /etc/smartd.conf
.
/dev/nvme0 -H -l error -m root -M test
-H
display the health status as reported by the device
-l error
show the increase in the number of SMART errors since last check
-m root
local user root receives warning by email
-M test
send a test email on startup