Sun-related

=Sun Solaris Related Information=
 * SUN: How to Use FTP to Submit Sun Explorer Files
 * Service Management Facility (SMF)
 * how to troubleshoot AUTOMOUNTD
 * Solaris Tips and Tricks and here and here
 * Syslog Related info
 * How to add Solaris 10 server into MS Active Directory domain
 * JumpStart HOWTO (from here)

Consoles

 * Which systems have ILOM/ALOM4v/ALOM/LOM/ELOM/SP or RSC
 * ILOM information
 * ALOM information
 * Create an ALOM login on an ILOM (see here)
 * NEW USER: -> create /SP/users/ role=Administrator cli_mode=alom
 * EXISTING USER: -> set /SP/users/ cli_mode=alomSet 'cli_mode' to 'alom'


 * X4200 ILOM tricks. Also: Sun x4200 ILOM docs.
 * Current Patches
 * 118323 -> Sun Fire 280R / Blade 1000 / 2000
 * 119381 -> RSC 2.2.3 bug fixes for Solaris 9 and Solaris 10
 * 147307 -> T5120 / T5220
 * 142700 -> v210/v240 netra 210/240 OBP
 * 142707 -> v440 / N440 OBP PROM
 * 142702 -> v480 / v490 PROM update

Enable/Disable System Components
Ran into this after a memory upgrade on a T5220 - on boot it complained :
 * ERROR: MB/CMP0/MCU0 unused because MB/CMP0/MCU1 is not configured


 * To display system components (at ILOM CLI): show -level all -o table component_state 
 * In this instance I saw "/SYS/MB/CMP0/MCU1  | component_state        | Disabled"


 * Looking here I found the commands needed to fix this:
 * set /SYS/MB/CMP0/MCU1 component_state=Enabled
 * stop /SYS
 * start /SYS 

Storage

 * Sun storage and Storedge 3510 info
 * Sun storage and Storedge 6140 info


 * Solaris Storage related stuff
 * Repairing Corruption on a DiskSuite File System from here

Setup

 * Solaris upgrade - both online & offline
 * Solaris 10 setup
 * Solaris 10 SNMP setup
 * Setup SAR
 * Sun Solaris Security Toolkit
 * Solaris 10 password rules and Solaris 9 password rules
 * If you ever get an error "fatal: libcrypt_d.so.1: open failed: No such file or directory" then :
 * Mount the Solaris install DVD
 * Change directory to : cd /cdrom/cdrom0/s0/Solaris_10/Product
 * Install the required packages : pkgadd -d . SUNWcrman SUNWcry SUNWcryr

Ports & PIDs
Script to Show Ports & PIDs. /usr/bin/ps -ef | sed 1d | awk '{print $2}' /usr/proc/bin/pfiles  2>/dev/null | /usr/xpg4/bin/grep  OR /usr/bin/ps -o pid -o args -p  | sed 1d
 * Listing all the pids:
 * Mapping the files to ports using the PID:

for i in `ps -e|awk '{print $1}'`; do echo $i; pfiles $i 2>/dev/null | grep 'port: 8080'; done OR pfiles -F /proc/* | nawk '/^[0-9]+/ { proc=$2} ; /[s]ockname: AF_INET/ { print proc "\n " $0 }'
 * Mapping the sockname to port using the port number:

Hardware Troubleshooting
From here.
 * How to check for hardware faults : /usr/sbin/fmadm faulty
 * To clear error logs within FMA use the following steps :
 * clearing out FMA files with no reboot needed:
 * svcadm disable -s svc:/system/fmd:default
 * cd /var/fm/fmd
 * find /var/fm/fmd -type f -exec ls {} \;
 * find /var/fm/fmd -type f -exec rm {} \;
 * svcadm enable svc:/system/fmd:default


 * After these steps check to see if problem still exit.
 * fmadm faulty

Device Drivers
You can do a man on driver.conf (4) for information on device drivers.
 * Add a new Device Driver : add_drv
 * Remove a Device Driver : rem_drv
 * Update a Device Driver : update_drv

Misc

 * CRON log : /var/cron/log
 * Global Shell settings (i.e. prompt) : /etc/profile
 * Check when last patched : ls -lrt /var/sadm/patch/  (check timestamps of patch directories)
 * Configure runtime linking environment : use crle command
 * Global Profile Settings : /etc/profile (also more stuff in /etc/default/)
 * Solaris Zone Setup Information
 * Samba on Solaris 10
 * Convert Logical to Physical disk (i.e. md0 to CxTxDx)
 * convert ssd to cxtxdx
 * iostat -E | grep Soft | awk '{ print $1}' > /tmp/a; iostat -En | grep Soft|awk '{ print $1 }' > /tmp/b; paste /tmp/a /tmp/b


 * Show Ports & PIDs
 * Display System and Process Information
 * Solaris Internals WIKI
 * TCP Wrappers in Solaris 10
 * Get description of SMF services : svcs -a -o FMRI,DESC
 * Show network interfaces : dladm show-dev


 * Use netboot to flash OBP
 * Core Dump Management & Analysis info
 * The article References a script found called AppCrash which can be found here referenced in another article.

Clean up /var

 * Make a backup copy
 * find /var/sadm/pkg -name undo.Z -o -name obsolete.Z | cpio -pdv /u00/PATCH_BACKOUTS


 * Free up space in /var
 * find /var/sadm/pkg -name undo.Z -o -name obsolete.Z | xargs rm -f


 * In the event that you need to back out patch 999999-01 :
 * cd /u00/PATCH_BACKOUTS
 * find var/sadm/pkg | grep 999999-01 | cpio -pdv /
 * patchrm 999999-01


 * Typical strategy around this is to only clear out patch backout information that is old
 * find /var/sadm/pkg -name undo.Z -mtime +180 | xargs rm -f
 * I've never noticed the obsolete.Z files as a space hog, so I haven't pruned them.

Or, (from here): find /var/sadm/pkg -mtime +90 -name undo.Z -o -name obsolete.Z -exec rm {} \;

DVD and CD Stuff

 * Burn a DVD : cdrw -i  
 * Check media in DVD player : cdrw -M
 * List writing devices : cdrw -l
 * Copy Disc
 * With CDRW : cdrw -c
 * without CDRW : dd if= of= [i.e. dd if=/cdrom/sol_10_508_sparc/ of=solaris.iso]
 * Create Disc from directory (ex #2 on cdrw man page) : mkisofs -r /some/dir 2>/dev/null | cdrw -i -p 1

SAN Related
Other pages:
 * Convert LD & disk devices, Sun StorEdge 3510 stuff , How to Map a Disk to a 3510 Partition
 * Map 3510 devices to Solaris, HBA Cheat Sheet , Show LUN from format name
 * mpxio quickstart

Configuring and monitoring the hardware RAID controller
For hardware such as T5xxx (and T2xxx) systems. Here is a modified version of the check script from here


 * 1) !/bin/bash
 * 2) Program: Check the status of LSI RAID controllers
 * 1) Program: Check the status of LSI RAID controllers

for volume in `raidctl -l | awk -F: '/Volume/ {print $NF}'` do if raidctl -l ${volume} | egrep "${volume}.*OPTIMAL" > /dev/null then logger -p daemon.notice "LSI Raid : system RAID and Volumes are in a good state" else logger -p daemon.notice "HARDWARE ERROR: The disk controller is no longer in an optimal state" logger -p daemon.notice "HARDWARE ERROR: Run raidctl to see if a disk failed" fi done

SVM (aka SDS) Related
from here - An interesting issue where a disk isn't bad but some METAs show "Need maintenance" State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d10 c1t0d0s0
 * First, verify it's not a disk hardware problem with iostat -en 
 * Next, run format --> analyze --> read on the disk (won't harm OS)
 * Next, run metasync . Afterwards, a metastat command will show it as something like :
 * Next, just do a metareplace, such as:
 * 1) metareplace -e d10 c1t0d0s0
 * 2) metastat d10

Printers
destination "chubbchkprt" now accepting requests printer "chubbchkprt" now enabled printer chubbchkprt is idle. enabled since Tue Sep 30 19:57:42 2008. available.
 * Setup network printers. An example of how to from here:
 * 1) lpadmin -p chubbchkprt -v /dev/null
 * 2) lpadmin -p chubbchkprt -m netstandard
 * 3) lpadmin -p chubbchkprt -o dest=10.12.7.250:9100 -o protocol=TCP
 * 4) lpadmin -p chubbchkprt -I any
 * 5) accept chubbchkprt
 * 1) /usr/bin/enable chubbchkprt
 * 1) lpstat -p chubbchkprt
 * To discover a printer's IP, it can be found in a few places. Check
 * /etc/lp/interface/  for a line starting with "PERIPH="
 * /etc/lp/printers//configuration look for the "Options: " line

Patching
xrefdir=/export/home/PATCHES patchdir=/export/home/PATCHES update=auto ignore=119213 ignore=121657
 * Pkgadd base directory is defined in "/var/sadm/install/admin/default" along with other defaults.
 * Getting Patch Clusters via WGET (see doc ID 1199543.1)
 * You can check Patches in: /var/sadm/patch
 * Patch Check Advanced (PCA)
 * Config file is pca.conf and goes either in the same dir as pca or in the /etc directory. Here's an example that puts the patchdiag.xref file and all the patches in the /export/home/PATCHES directory and skips patches number 119213 & 121657 (i.e. for Luminis):


 * Patch Report
 * How to Remove a Solaris Patch While Booted From a Network or CD-ROM
 * After all the necessary file systems have been mounted (patches are in  /var/sadm/patch/ ), do a dry run by running the following:
 * patchrm -a -R /a  


 * This command does not update any of the installed software, but it validates that the necessary file systems are mounted, especially if there are non-global zones installed. If this command fails, you must resolve any issues prior to running patchrm. Make sure all file systems that contain non-global zones have been mounted, and then try again.


 * After "patchrm -a -R /a" passes, run the following:
 * patchrm -R /a <patch_number> 


 * Take care to record the exact output for any future debugging.


 * **Note: Do not use any chroot commands when running patchrm from media. Also, do not run patchrm from the mounted system, for instance: /a/usr/sbin/patchrm

Password last Changed
A script (from here) which shows when user passwords were last changed.
 * 1) !/usr/bin/perl
 * 2) Output date format is YYYY-MM-DD

open( S, "/etc/shadow" ); while( <S> ) { ($user,$lastchg) = (split /:/)[0,2]; @t = localtime( $lastchg*86400 ); printf "User %-8s last changed password %0.4d-%0.2d-%0.2d (%5d)\n", $user, $t[5]+1900, $t[4]+1, $t[3], $lastchg; } close( S ); exit 0;

No Login Passwords
In Solaris 10 you can have a "Non-Login" account where you can SUDO to it or have cron jobs running but that you can't log into directly. These are setup using "passwd -N". More info here. ** Note - Solaris 10 only **

Account Parameters

 * Using the passwd command to set parameters like :
 * -w <--- (how many days before PW expires to  )
 * -n <- (  days before you can change PW again)
 * -x < (  number days password is good for)


 * To set # days inactivity before account is locked :
 *  usermod -f <# days> 


 *  /etc/default/passwd 
 * You can add " WARNWEEKS= " in there to set the default to warn about password expiry

=Booting & Kernel Related=
 *  Show Kernel Paramters : prctl $$
 *  Show TCP parameters :
 * for i in `ndd /dev/tcp \?|awk '{print $1}'| egrep -v "\?|tcp_status|hash|close"`;do printf "%-30s: " $i;ndd /dev/tcp $i; done


 *  How to be able to FSCK "/" and everything else :
 * From the OK prompt: boot -m milestone=none

Recover kernel
After running into an error with a patch that was supposed to update the kernel and the system wouldn't reboot, a useful piece of information if your kernel is whacked (i.e. read bottom of this page). from the OK prompt do: {0} ok boot -F failsafe Or you can boot off of a Solaris CD/DVD using "boot cdrom -sw". Once in, do:
 * 1) Mount the root filesystem slice to /<root-fs-mount-point>
 * 2) rm -f /<root-fs-mount-point>/platform/`uname -m`/boot_archive
 * 3)  EITHER : "/<root-fs-mount-point>/sbin/bootadm -a update_all"
 * 4)  OR (to rebuild): "/<root-fs-mount-point>/sbin/bootadm update-archive -R /<root-fs-mount-point>"

Booting problems in Solaris

 * (from here)

1. Timeout waiting for ARP/RARP packet : At ok> type printenv and look for these parameters.
 * boot-device disk
 * mfg-switch? false
 * diag-switch? false
 * if you see “boot-device net ” or true value for the other two parameter change it to the values above. In case you wants to boot from network make sure your client is properly configured in boot server and network connections & configuration are proper.

2. The file just loaded does not appear to be executable : Boot block on the hard disk is corrupted. Boot the system in single user mode with cdrom and reinstall boot block.
 * 1) installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t3d0s0

3. bootblk: can’t find the boot program : boot block can not find the boot programe – ufsboot in Solaris .Either ufsboot is missing or corrupted. In such cases it can be restored from the cdrom after booting from cdrom & mounting the hard disk
 * 1) cp /platform/`uname -i`/ufsboot /mnt/platform/`uname -i`

4. boot: cannot open kernel/unix : Kernel directory or unix kernel file in this directory is not found .Probably deleted during fsck or deleted by mistake. Copy it from the cdrom or restore from the backup tape.
 * 1) cp /platform/`uname -i`/kernel/unix /mnt/platform/`uname -i`/kernel

5. Error reading ELF header ? : Kernel directory or unix kernel file in this directory is corrupted.Copy it from the cdrom or restore from the backup tape.
 * 1) cp /platform/`uname -i`/kernel/unix /mnt/platform/`uname -i`/kernel

6. Cannot open /etc/path_to_inst : System can not find the /etc/path_to_install file. It might be missing or corrupted and needs to be rebuild. To rebuild this file boot the system with -ar option : ok>boot -ar
 * Press enter to select default values for the questions asked during booting and select yes to rebuild /etc/path_to_install
 * The /etc/path_to_inst on your system does not exist or is empty. Do you want to rebuild this file [n]? y
 * system will continue booting after rebuilding the file.

7. Can’t stat /dev/rdsk/c0t3d0s0 : When booted from cdrom and done fsck the root partition comes out to be fine but on booting from root disk this error occurs. The device name for / is missing from /dev/dsk directory and to resolve the issue /dev & /devices directories has to be restored from root backup tapes.

Booting
OBP quick reference
 * boot –a => Ask me. Interactive mode prompts for the names of the boot files. (Helpful if you need to boot off an alternate /etc/system file after kernel t unable modifications.)
 * boot –D => default-file Boot from default -file.
 * boot –f => When booting an Autoclient system, forces boot program to bypass client’s local cache and read all files over the network from the file server.
 * boot –h => Boot halted. Boot into a halted state (ok prompt). Interesting, for troubleshooting boot at the lowest level.
 * boot –r => Reconfigure boot. Boot and search for all attached devices, then build device entries for anything which does not already exist. Useful when new devices are added to the system.
 * boot –s => Single user. Boots the system to run level 1.
 * boot –v => Verbose boot. Show good debugging information.
 * boot –V => Verbose boot. Show a little debugging information.

=ASR Setup and Troubleshooting=
 * Example of setting up ASR

ASR Troubleshooting & Documentation

 * ASR General Troubleshooting (old - v3.2)
 * Installation and Operations Guide (v3.4)
 * If you see an error messsage such as the one below (see HERE), fix by doing: ln -s /usr/lib/libm.so.1 /usr/lib/libm.so.2
 * ld.so.1: stclient: fatal: libm.so.2: open failed: No such file or directory

Setup

 * Before starting, install Service Tags and Service Tools Bundle (STB) on ALL systems.
 * Designate an ASR Manager server - then install SASM (SUNWsasm) on it.
 * Add the asr command to the PATH (update to the root's .profile, .cshrc, .kshrc. or .bashrc as needed):
 * PATH=$PATH:/opt/SUNWswasr/bin ; export PATH


 * On the ASR Manager server, run: asr register
 * To enable SASM service: svcadm enable sasm
 * To check registration status: asr show_reg_status
 * To test connection do: asr test_connection
 * On ASR Manager server copy /opt/SUNWswasr/ASRAssetBundle.*.tar.gz file to the various clients
 * On ASR Manager Server, register the hosts with asr activate_asset -h