Notifying the ReiserFS of the bad area

[Note]Note

It appears that reiserfstune may not be run on a mounted file system. So we must unmount the file system after all. The only advantage we created is that this can be (hopefully) a quick operation now.

[Note]Note

It also appears that reiserfstune cannot not accept the output of badblocks. Their units of disk space differ (see ) so we have to convert: for n in `cat bad_blocks.dev.sda9` ; do echo -e "${n}\n8\n/\np"|dc ; done > converted-badblocks-file

When we try to notify the filesystem of its bad blocks ( reiserfsck --add-badblocks <converted-badblocks-file> --fix-fixable /dev/sda9 ), the command returns an error and the message that the block under consideration is already in use, and please use reiserfsck to repair.

[Warning]Warning

This reiserfsck then fails with a segmentation fault, and we are glad to escape with our filesystem intact. There is no way that I will use reiserfsck --rebuild-tree on an already populated filesystem.

[Note]Note

I think we'd better stay away from ReiserFS from now on.

We use another route, and do a find /var/ -type f -exec cat {} \;>/dev/null on the affected filesystem. This fails with the message



find /var/ -type f -exec cat {} \;>/dev/null
cat: /var/lib/postgresql/8.1/main/base/16629/16667: Input/output error

      

and since we know that only a single sector is affected, this must be the file that causes the messages in our logs. So we'll do the following:

[Warning]Warning

In the following procedure, a long list of scsi-driver errors (as in ) is often still in the kernel ringbuffer . During start/stop/reload of syslog-ng they will scroll across the console. This may look disturbing, but it is not an indication that the bad part of the disk is still being accessed.

  1. Stop all services that use the filesystem by switching to single-user mode:

  2. Stop syslog-ng too, as it uses /var and is still active in runlevel 1

  3. unmount the filesystem [16]: umount /var

  4. mount it somewhere else:


        
          
    mkdir /mnt/sda9 && mount /dev/sda9 /mnt/sda9
          

        

      

  5. Move the file that lies on the bad sector to another filesystem:


        
          
    dd if=/mnt/sda9/lib/postgresql/8.1/main/base/16629/16667 of=/home/16667 conv=noerror
          

        

      

    [17]

  6. Unmount the partition:


        
          
    umount /dev/sda9
          

        

      

    1. Notify the filesystem of its bad blocks using reiserfstune:



        
      reiserfstune --add-badblocks ~/bad_blocks.dev.sda9.base4096 /dev/sda9
        


            

    2. When this fails with the already-in-use message, we try



        
      /sbin/reiserfsck --badblocks ~/bad_blocks.dev.sda9.base4096 /dev/sda9
        


            

      [Note]Note

      This failed during an earlier try, but it succeeded this time. YMMV.

  7. Remount the partition at the alternative mount point:


        
          
    mount /dev/sda9 /mnt/sda9
          

        

      

  8. Copy the file back in place:


        
          
    mv /home/16667 /mnt/sda9/lib/postgresql/8.1/main/base/16629/
          

        

      

  9. Unmount the partition from its alternative mount point:


        
          
    umount /dev/sda9
          

        

      

  10. Mount it in its usual place:


        
          
    mount /dev/sda9
          

        

      

  11. Start syslog again:


        
    /etc/init.d/syslog-ng start
        

      

  12. Return to multi-user mode:

After all this is done, we see no more SCSI errors in the logs, and debugreiserfs -B /tmp/bad /dev/sda9 && cat /tmp/bad confirms that block 6512898 is bad.

Example 2.  SCSI errors in the log



Raw sense data:0xf0 0x00 0x03 0x0a 0xb5 0xf0 0xfd 0x0a 0x00 0x00 0x00 0x00 0x11 0x00 0xe4 0x80 0x00 0x86
 I/O error: dev 08:09, sector 104206368
scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 0a b5 f0 fa 00 00 08 00
Info fld=0xab5f0fd, Current sd08:09: sns = f0  3
ASC=11 ASCQ= 0

      


Table 1.  Units of disk space used by programs involved in a ReiserFS badblocks detection

programunitsize
kernel sata driversectors512 bytes
badblocksblock1024 bytes
ReiserFSblock4096 bytes



[16] Of course, first we have to unmount (possibly remote) file systems mounted on subdirectories of /var, like f.i. /var/mail.

[17] We cannot use cp, as it will stop when it encounters the bad sector, and copy only part of the file.