Bad block data rescue under Linux

By /robex/, June 2020. Back to Articles and Guides

Abstract

Obligatory disclaimer before proceeding: always backup your valuable data regularly. I did this on a drive that had unimportant files. Its also important to note that not every kind of file will be useful after rescue if it has damaged blocks even if most of it is correct, especially if the format contains some sort of checksum (as PNG images or executables do).

Under Linux, if your HDD (/dev/sdb from now on) starts to fail or has a bunch of damaged sectors, eventually you'll start to face errors similar to this:

cat: file.txt: Input/output error

Accompanied by some scary dmesg logs:

[ 2864.580952] blk_update_request: I/O error, dev sdb, sector 573781912
[ 2864.583314] ata3: EH complete
[ 2867.429748] ata3.00: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0
[ 2867.432805] ata3.00: irq_stat 0x40000008
[ 2867.435798] ata3.00: failed command: READ FPDMA QUEUED

Checking the drive's SMART data with smartctl -a /dev/sdb (output partly shown here) should help diagnosing the problem:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
197 Current_Pending_Sector  0x0032   200   198   000    Old_age   Always       -       52

This value indicates that there are currently 52 unstable sectors with unrecoverable read errors, that will only be remapped after a write or a succesful read. The files or partitions that have chunks located within these sectors cannot be copied with regular linux utilities such as cp or rsync, and they can also not be deleted with rm.

The object of this article is to show how to recover the good parts of files or partitions while ignoring, zeroing out or truncating the bad sectors.

Software comparison

In this article we will be comparing 3 popular tools that are regularly used for data rescue:

dd: the classic UNIX tool (well, actually the GNU one).
ddrescue: a dedicated data recovery tool similar to dd.
sdd: a fork of dd part of schily-tools, by Jorg Schilling. Not offered in apt repositories but very trivial to compile.

So, why all these different tools? Well, in my case I was recovering a bunch of video streams downloaded from the internet that had gotten corrupted, so even if some parts of the file were bad, as long as the metadata was fine they would still play (albeit with some visual artifacts). However, it was important that the total file size remained the same, since the metadata contains chunk lengths and other size-related information.

My initial idea was to use a tool that would write zeros on the bad sectors in order to allow me to easily see the corruption with a hex editor. I assumed this would be the default behavior or an option for good old dd but its not. The information of what exactly each of the tools does with bad sectors wasn't too easy to find online which is why I decided to put it all together here.

dd

By default dd panics out when it encounters an I/O error, but you can force it to keep going with the option conv=noerror. However, bad sectors will be trimmed resulting in a smaller output file, as seen in the following example:

$ dd if=vid.mp4 of=vid.mp4.dd_noerror conv=noerror
...
-rw-rw-r-- 1 user user  415312565 Jun 22 07:12 vid.mp4
-rw-rw-r-- 1 user user  415300277 Jun 22 06:32 vid.mp4.dd_noerror

dd also offers the option sync, described in the manual as "pad every input block with NULs to ibs-size". This is close to what we want except it also fills the last block if it is incomplete (dd usually defaults to 512 block size), resulting in a bigger file in most cases:

$ dd if=vid.mp4 of=vid.mp4.dd_noerror,sync conv=noerror,sync
...
-rw-rw-r-- 1 user user  415312565 Jun 22 07:12 vid.mp4
-rw-rw-r-- 1 user user  415312896 Jun 22 06:38 vid.mp4.dd_noerror,sync

ddrescue

ddrescue is a dedicated recovery tool, so it tries to minimize disk accesses as much as possible. By default it makes the recovered file the same size as the original but it doesn't write over the offsets of the bad sectors in the output (recovered) file, so whatever data was there before remains. It also keeps a log of the bad sectors which can be supplied to other tools like badblocks or fsck later, in order to try to repair or remap faulty sectors:

$ ddrescue vid.mp4 vid.mp4.ddrescue_default rescue.log
...
-rw-rw-r-- 1 user user  415312565 Jun 22 07:12 vid.mp4
-rw-rw-r-- 1 user user  415312565 Jun 22 06:49 vid.mp4.ddrescue_default

Unlike dd, ddrescue has an option to fill the bad blocks with whatever you want. It's called "fill mode" and usage is as follows:

$ ddrescue --fill=- input vid.mp4.ddrescue_default rescue.log

rescue.log is the log file generated by the default execution of ddrescue, and input is a user-created file that contains the input to be written over the holes left by bad sectors. If you want zeros, just make a file that contains a single NUL byte (0x00) with a hex editor or the following command:

$ printf '\x00' > input

As an example, here is the binary content of one of the holes in the recovered file after filling it with the string "BAD BLOCK":

...
092bbfe0h:  7f e7 b1 a9 7b 16 2b cb  99 7b d4 42 8c ee ed ef  ....{.+..{.B....
092bbff0h:  7b b4 eb 70 5f 5f 18 c5  1a 6b 5c 13 80 9e 57 3c  {..p__...k\...W.
092bc000h:  42 41 44 20 42 4c 4f 43  4b 42 41 44 20 42 4c 4f  BAD BLOCKBAD BLO
092bc010h:  43 4b 42 41 44 20 42 4c  4f 43 4b 42 41 44 20 42  CKBAD BLOCKBAD B
092bc020h:  4c 4f 43 4b 42 41 44 20  42 4c 4f 43 4b 42 41 44  LOCKBAD BLOCKBAD
092bc030h:  20 42 4c 4f 43 4b 42 41  44 20 42 4c 4f 43 4b 42   BLOCKBAD BLOCKB
092bc040h:  41 44 20 42 4c 4f 43 4b  42 41 44 20 42 4c 4f 43  AD BLOCKBAD BLOC
092bc050h:  4b 42 41 44 20 42 4c 4f  43 4b 42 41 44 20 42 4c  KBAD BLOCKBAD BL
092bc060h:  4f 43 4b 42 41 44 20 42  4c 4f 43 4b 42 41 44 20  OCKBAD BLOCKBAD
...

sdd

sdd has basically the same syntax as dd but the default behavior for noerror is to fill with zeros, not to trim the bad sectors in the output file. It also shows progress and it has a nicer output (in my opinion):

$ sdd if=vid.mp4 of=vid.mp4.sdd_noerror -noerror
...
-rw-rw-r-- 1 user user  415312565 Jun 22 07:12 vid.mp4
-rw-rw-r-- 1 user user  415312565 Jun 22 06:12 vid.mp4.sdd_noerror

And for comparison, the same hole shown earlier:

...
092bbfe0h:  7f e7 b1 a9 7b 16 2b cb  99 7b d4 42 8c ee ed ef  ....{.+..{.B....
092bbff0h:  7b b4 eb 70 5f 5f 18 c5  1a 6b 5c 13 80 9e 57 3c  {..p__...k\...W.
092bc000h:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
092bc010h:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
092bc020h:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
092bc030h:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
092bc040h:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
092bc050h:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
092bc060h:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
...

Conclusions

Both ddrescue and sdd do the job fine (sdd in a single command with easier syntax), but I would recommend you to stick to ddrescue, especially if you're recovering whole disks or partitions, as it is a more popular and tested tool and it keeps a useful log of the bad sectors. As a little side note, here is how you would clone an entire disk with ddrescue:

$ ddrescue -dfn /dev/input /dev/output rescue.log

-d: use direct disk access.
-f: force overwriting partitions (necessary for partition/disk cloning).
-n: do not retry failed blocks (reduces disk strain and time drastically).

And how you would attempt to reallocate bad sectors on the faulty drive (WARNING: THIS WILL DESTROY ALL DATA ON THE DISK!):

$ badblocks -wsv -t 85 -c 65536 -o badblocks.txt /dev/sdb

-w: use destructive write test.
-s: show progress.
-v: verbose mode.
-t 85: make a single pass with test pattern 0x55 (by default, 4 passes which seems excessive).
-c 65536: number of blocks tested concurrently. Dramatically reduces time to complete test vs. default parameter (64).
-o badblocks.txt: write list of healthy and bad block ranges to output file.

To finish, you should run a long SMART test on the drive to make sure everything is fine (although if bad sectors stop getting remapped automatically, I wouldn't recommend using this drive for anything remotely important. It has a high chance of failing again soon):

$ smartctl -t long /dev/sdb
# check test progress and results with
$ sudo smartctl -a /dev/sdb

/robex/ - Last edited: 2024-11-16 15:22:06