SDSrecover

=Repairing Corruption on a DiskSuite File System=
 * (Copied from here)

The procedure described here is provided by Sun in order to undo DiskSuite, restore affected file systems from backups, and redo DiskSuite. We adapt this procedure to run fsck safely on bad file systems.

Why is this needed? If a file system cannot be unmounted (e.g. root file system), but you need to fsck it, then you need to boot from CD-ROM. But the OS on the CD-ROM has no DiskSuite drivers, so it can’t understand DiskSuite metadevices. You should never fsck the underlying physical slices of a DiskSuite metadevice (e.g. mirrors round-robin, and fscking their underlying slices can put them out of synch). Thus, you must undo DiskSuite first before fscking such file systems, and re-do DiskSuite afterwards.

In our cases, the root file system was okay, but others were corrupted. To fix the corruption, the outline is:


 * Boot from CD-ROM.
 * Undo DiskSuite following Steps 5 through 7 of the following procedure from Sun.
 * Fsck the underlying slices on whichever of the mirrored disks you’re going to consider to be your “good” one
 * Repeat fsck until all slices fsck without errors. If fsck finds a problem and repairs it, fsck again until no errors are found!
 * Continue with the procedure from Sun at Step 8 to re-do DiskSuite.

NOTES:


 * The following document is unaltered except for formatting.
 * At Step (8), boot single user first, and set a root password. Recall that our standard practice is to have no root password in /etc/passwd or /etc/shadow, but you won’t be able to become root using suw after logging in as yourself when you’re booting single-user and leaving other file systems (e.g. /u) unmounted. So you must set a root password and then log in as root.
 * Step (10) talks about restoring contents of md.tab. This file contains only comments, even on live systems. And the example they show is not actually md.tab; it matches with what is found in md.cf.
 * Rather than using md.cf and running metainit to automatically recreate all the DiskSuite metadevices as per Step 11, I recreated all the DiskSuite metadevices (including metadb replicas) “by hand” using documentation we typically stash in the local_ .uwaterloo.ca package about the machine’s DiskSuite configuration. So I can’t confirm whether or not the automatic approach works as documented.

http://sunsolve.central.sun.com/search/document.do?assetkey=1-25-14650-1
 * Resolution * 	*Top*

NOTE: In order to restore the machine exactly the way it was, it is necessary to have a record of the locations of the state databases, and also the configuration of the metadevices. These are *not* held in user-readable form on the root filesystem, so it is useful to get the information BEFORE a crash with the commands:

# metadb -i This command shows the condition and location of DiskSuite's state database replicas. It is useful to have this information in case their recreation is necessary.

# metastat -p You will redirect this output to a file for safekeeping. This information is most often sent to the /etc/opt/SUNWmd/md.tab file for disaster recovery purposes (see step 6 below). Check the md.tab file prior to overwriting it, though. If a valid one exists, redirect the metastat -p to another filename and run a 'diff' on it to check for inconsistencies. The metastat -p, however, will reflect the most current, known, configuration.

Of course, you will need to create a new root disk and restore the last full backup to that new disk. It is expected that you will partition the disk in much the same manner as you had it before. Remember that if there was a small partition on your original disk for a state replica that you do the same thing for this disk.

Once the disk has been created, restore your root filesystems from backups. However, before attempting to boot the system after the restore, we must make some necessary changes to let Solaris know that the root disk is no longer mirrored. To do this, the /etc/system and the /etc/vfstab files must be modified.

1) Boot cdrom, and mount the root filesystem to /a.

ok boot cdrom -s # mount /dev/dsk/c#t#d#s# /a

2) Restore the root filesystem from backup tape into /a and initialize  the boot block using

# installboot /usr/platform//lib/fs/ufs/bootblk /dev/rdsk/c#t#d#s#

3) Mount and restore any other critical filesystems such as /usr /var /opt etc.

4) If the replacement machine does not have it disks connected via the same  paths as the original then you may have to follow the procedure documented in   SRDB 15010 to rebuild the /devices and /dev structures.

5) vi /a/etc/system (before using vi you may need to set the terminal type e.g. TERM=sun; export TERM)

Remove ALL lines between the "MDD root info" lines as well as those between the "MDD database info" lines. In the following example file, all these lines would be removed from the file:

-      * Begin MDD root info (do not edit) forceload: misc/md_trans

forceload: misc/md_raid forceload: misc/md_hotspares forceload: misc/md_stripe forceload: misc/md_mirror forceload: drv/sd forceload: drv/esp forceload: drv/espdma forceload: drv/sbus forceload: drv/iommu rootdev:/pseudo/md@0:0,3,blk * End MDD root info (do not edit)* Begin MDD database info (do not edit) set md:mddb_bootlist1="sd:14:16 sd:15:16" * End MDD database info (do not edit) -

6) vi /a/etc/vfstab

Change all metadevices for the root filesystems (root, usr, var, and    opt) back to regular slices (/dev/dsk/c#t#d#s#). Comment out all the other metadevices for the time being. For example:

** EXAMPLE FILE BEFORE: ---  #device         device           mount         FS    fsck mount     mount #to mount      to fsck          point         type  pass at boot options #  /proc           -                /proc         proc  -    no          - fd             -                /dev/fd       fd    -    no          - swap           -                /tmp          tmpfs -    yes         - /dev/md/dsk/d1 -                -             swap  -    no          - /dev/md/dsk/d0 /dev/md/rdsk/d0  /             ufs   1    no          - /dev/md/dsk/d2 /dev/md/rdsk/d2  /usr          ufs   1    no          - /dev/md/dsk/d3 /dev/md/rdsk/d3  /var          ufs   2    yes         - /dev/md/dsk/d4 /dev/md/rdsk/d4  /opt          ufs   2    yes         - /dev/md/dsk/d5 /dev/md/rdsk/d5  /export/home  ufs   3    yes         quota /dev/md/dsk/d6 /dev/md/rdsk/d6  /export/home1 ufs   3    yes         quota

---

** EXAMPLE FILE AFTER: ---  #device           device             mount         FS    fsck mount    mount #to mount        to fsck            point         type  pass at boot options #  /proc             -                  /proc         proc  -    no      - fd               -                  /dev/fd       fd    -    no      - swap             -                  /tmp          tmpfs -    yes     - /dev/dsk/c0t0d0s1 -                 -             swap  -    no      - /dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 /            ufs   1    no      - /dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /usr         ufs   1    no      - /dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /var         ufs   2    yes     - /dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /opt         ufs   2    yes     - #/dev/md/dsk/d5  /dev/md/rdsk/d5    /export/home  ufs   3    yes     quota #/dev/md/dsk/d6  /dev/md/rdsk/d6    /export/home1 ufs   3    yes     quota ---

7) Remove all lines (except for the 2 comment lines at the top) from    the "mddb.cf" file.   This file exists either in the /etc/opt/SUNWmd     or /etc/lvm directory, depending on the version of DiskSuite you are     running.

8) Boot the system from the newly restored boot disk.  When the system     comes up, only root (and /usr, /var, and /opt, if they exist) will be     mounted using the slices on the new root disk.

9) Re-add the state databases with the 'metadb' command.  Use the     output of the 'metadb' command you saved to get the locations.

metadb -a -f metadb -a -f metadb -a -f

10) Recreate the "md.tab" file from the 'metastat' output you saved (or from memory), except make sure that all mirrors to be one-way    mirrors and all RAID5 devices contain the "-k" option.   For the boot     disk, ensure that the one-way mirrors refer to the side which has     been restored.   Make sure the order is correct so that mirrors aren't     created until the submirrors already exist. See below the necessary     changes that must be made to ensure one-way mirrors are used.

The "md.tab" file exists either in the /etc/opt/SUNWmd or /etc/lvm directory, depending on the version of DiskSuite you are running.

For example:

vi md.tab

** EXAMPLE FILE BEFORE: -          d0 -m d10 d20                       <-- see the two-way mirror! d10 1 1 c0t0d0s0 d20 1 1 c1t0d0s0 d1 -m d11 d21 d11 1 1 c0t0d0s1 d21 1 1 c1t0d0s1 d2 -m d12 d22 d12 1 1 c0t0d0s6 d22 1 1 c1t0d0s6 d3 -m d13 d23 d13 1 1 c0t0d0s7 d23 1 1 c1t0d0s7 d4 -m d14 d24 d14 1 1 c0t0d0s4 d24 1 1 c1t0d0s4 d5 -m d15 d25 d15 4 c0t1d0s2 c0t2d0s2 c1t1d0s2 c1t2d0s2 d25 4 c0t3d0s2 c1t3d0s2 c1t4d0s2 c1t5d0s2 d6 -r c1t6d0s2 c1t8d0s2 c1t9d0s2 -

** EXAMPLE FILE AFTER: -          d10 1 1 c0t0d0s0 d20 1 1 c1t0d0s0 d0 -m d10                              <-- see the one-way mirror! d11 1 1 c0t0d0s1 d21 1 1 c1t0d0s1 d1 -m d11 d12 1 1 c0t0d0s6 d22 1 1 c1t0d0s6 d2 -m d12 d13 1 1 c0t0d0s7 d23 1 1 c1t0d0s7 d3 -m d13 d14 1 1 c0t0d0s4 d24 1 1 c1t0d0s4 d4 -m d14 d15 4 c0t1d0s2 c0t2d0s2 c1t1d0s2 c1t2d0s2 d25 4 c0t3d0s2 c1t3d0s2 c1t4d0s2 c1t5d0s2 d5 -m d15 d6 -r c1t6d0s2 c1t8d0s2 c1t9d0s2 -k -

11) Run the 'metainit' command to create all the metadevices that are    listed in the "md.tab" file.

# metainit -f -a

12) Run the 'metaroot' command to set the metadevice as a root device.

# metaroot d0

13) Add in the other metadevices into the /etc/vfstab file for the    other root filesystems as well as the other metadevices that you     created in step 7.

14) Reboot.

15) You will now attach all the second mirrors to all the mirrored   metadevices. For example:	# metattach d0 d20