Your Ad Here

IBM AIX/UNIX system storage administration ksh/perl scripting

Monday, May 25, 2009

Which Process Is Using Up Most The CPU Resources


How can you determine which process is using up the most CPU time?



The following commands and tools can be used to find which process is using the most cpu resources.

1. topas -P

In the topas -P output above the process called "cpu-eater" is the top consumer of cpu resources.

2. tprof -x sleep 10; vi

bosboot fails with malloc error 0301-106


During or after an OS upgrade, bosboot fails with the following error:

0301-106 /usr/lib/boot/bin/mkboot_chrp the malloc call failed for size

0301-158 bosboot: mkboot failed to create bootimage.

0301-165 bosboot: WARNING! bosboot failed - do not attempt to boot device.


During or after an OS upgrade, bosboot fails with the following error:

0301-106 /usr/lib/boot/bin/mkboot_chrp the malloc call failed for size

0301-158 bosboot: mkboot failed to create bootimage.

0301-165 bosboot: WARNING! bosboot failed - do not attempt to boot device.



Recently upgraded AIX OS

Diagnosing the problem

Check size of ODM class file...


# ls -al /usr/lib/objrepos/PdDv*
-rw-r--r-- 1 root system 110592 Apr 14 11:42 PdDv
-rw-r--r-- 1 root system
200937472 Apr 14 11:42

Resolving the problem

bosboot uses the PdDv ODM class files to build device information into the boot image and pre-allocate memory for these devices. If the file is too large, malloc cannot satisfy the request, causing bosboot to fail.

The following instructions can be used to reduce the size of the file:

# mkdir /tmp/objrepos
# cd /tmp/objrepos
# export ODMDIR=/usr/lib/objrepos
# odmget PdDv > PdDv.out
# cp /usr/lib/objrepos/PdDv /usr/lib/objrepos/PdDv.bak
# cp /usr/lib/objrepos/ /usr/lib/objrepos/
# export ODMDIR=/tmp/objrepos
# echo $ODMDIR
# odmcreate -c /usr/lib/cfgodm.ipl
# ls -l PdDv*
# odmadd /tmp/objrepos/PdDv.out
# ls -l PdDv*
# cp /tmp/objrepos/PdDv /usr/lib/objrepos/PdDv
# cp /tmp/objrepos/ /usr/lib/objrepos/
# export ODMDIR=/etc/objrepos
# rm -rf /tmp/objrepos

Using DBX and KDB to build stack traces


I have a hung process, how can I get a stack trace of it?



NOTE: Not all processes that show up in ps -ef will be able to have stack traces built on them. Old processes tend to be eventually paged out of memory and neither dbx or kdb will then be able to be used to look at the stack trace for that process.

DBX Stack Trace Instructions for building a stack trace on a hung process:

In order to use dbx, the customer must first have the fileset
bos.adt.debug installed.

Attach to hung process
1. Capture console output, enter:

2. Enter:
ps -ef | grep

3. Enter:
dbx -a

4. Format trace, enter:

5. Leave dbx, enter:
detach (Typing quit will kill the process)

6. To leave script, type exit.
The script will be named typescript and will be located in the current
working directory.

Steps to obtain thread stack trace using kdb
Using the alog process as an example.

1) Start script session to capture data:
# script /tmp/kdb.out

2) Find the process id and convert it into hexadecimal:

# ps -ef | grep alog
root 1231 1 1 Jun 30 - 1:12 alog

Convert 1231 to Hexadecimal number
1231 converts to 4CF

3) Start kdb

# kdb

4) Locate the process while in kdb

(0) p * | grep 4CF

pvproc+013800 78 alog ACTIVE 004E036 004A01E
0000000002525400 0 0001

5) find initial thread
(0) p (PSLOT) | grep pvthread [The pslot is the second
column. In the above example, it is 78]

6) locate initial thread in 'p' output

THREAD..... threadlist :EA005E00

7) list function stack for initial thread
(0) f pvthread+005E00

8) Exit out of the script session
# exit
Data will be saved in /tmp/kdb.out.

The procstack command can also be used to print the stack of a process.

# ps -ef | grep alog
root 491752 450572 0 15:45:52 pts/4 0:00 alog

# procstack 491752
491752: alog
0xd0375da4 read(??, ??, ??) + 0x1a8
0x10001500 main(??, ??) + 0x11b0
0x10000198 __start() + 0x98

JFS2 Snapshot Quick Reference

This document is a quick guide to using snapshots of JFS2 filesystems

The JFS2 snapshot command will create an image of a filesystem at a point in time, allowing the user to back up data from the snapshot rather than from the original filesystem. This allows backing up data without having to stop using it first.
The concept used in the snapped filesystem is "copy on write". During creation of the snapshot filesystem the source filesystem is quiesced while the copy is made, to insure a proper copy. Then only the filesystem structure is created. When any modification is done to the source system, such as a write of data or delete, the original data is copied into the snapped filesystem.

Usually a snapshot filesystem will only need to be 2-6% of the size of the original filesystem, due to this copy-on-write feature.

* Creating a snapshot:
Find out the size of the filesystem:

# lsfs -q /origfs
Name Nodename Mount Pt VFS Size Options Auto
/dev/fslv02 -- /origfs jfs2 4194304 rw,cio no
(lv size: 4194304, fs size: 4194304, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, reserved: 0, reserved: 0, DMAPI: no, VIX: yes)

In the lsfs -q output the size is reported in 512-byte blocks. So in the above example the filesystem and logical volume are 2Gb in size. We'll make the snapshot filesystem 204Mb (10% of the original).

# snapshot -o snapfrom=/origfs -o size=419430
Snapshot for file system /origfs created on /dev/fslv05

* Mounting a snapshot:
# mount -v jfs2 -o snapshot /dev/fslv05 /mysnap

* Finding out if a fs has a snapshot already:
# snapshot -q /origfs

Snapshots for /origfs
Current Location 512-blocks Free Time
* /dev/fslv05 419430 418662 Fri Apr 21 08:30:36 PDT 2006

* Deleting a snapshot:

# snapshot -d /dev/fslv05
rmlv: Logical volume fslv05 is removed

For further information see the man page for the snapshot command.

Friday, May 22, 2009

Replacing a disk in an SSA RAID5 Array

Replacing a disk in an SSA RAID5 Array


OS level: 4.3.x - 5.x
SSA Raid 5 Array


How do I replace a disk in an SSA RAID5 array?


If the disk has not been rejected from the array:
Enter smitty ssaraid and select the following:
--> Change Member Disks in an SSA RAID Array
--> Remove a disk from an SSA RAID Array
--> Select the array in question and remove pdisk#...

The following steps apply to both rejected and non-rejected disks:
2) Have the CE physically replace the disk (he should set it in service mode).
3) rmdev -dl pdisk# ; cfgmgr -vl ssar
4)Enter smitty ssaraid and select the following:
--> Change/Show use of an ssa physical disk
--> Change the new pdisk's "current use" to Array Candidate.
5) smitty ssaraid
--> Change Member Disks in an SSA RAID Array
--> Add a disk to an SSA RAID Array
--> Add the new pdisk definition to the array

This definition should now be in a "degraded" state. After adding the new disk the array will go into a "Rebuilding" state, and ultimately a "Good" state after the rebuild operation is complete. This progress can be monitored via:

smitty ssaraid
--> List Status of all Defined SSA RAID Arrays
--> These numbers will get smaller as the array rebuilds and once they all go to zero, the array should be in a "Good" State"

This menu does not update dynamically, you will have to exit out, then go back in to see the progress.

Wednesday, May 06, 2009

Determine which process is using a specific network port on AIX with or without lsof

Method 1. Using lsof.

It will be easy if you have lsof installed.

# lsof -i:32876
oracle 135744 oracle7 13u IPv4 0x70bbe200 0t11 UDP loopback:32876

# ps -ef|grep 135744
oracle7 135744 1 0 Apr 21 - 2:45 ora_pmon_ftc_p01

Method 2. Using netstat and rmsock

# netstat -Aan|grep 9991
f100020000626b98 tcp4 0 0 *.9991 *.* LISTEN

# rmsock f100020000626b98 tcpcb
The socket 0x626808 is being held by proccess 200928 (sysscand).

#ps -ef|grep 200928
root 200928 1 0 Apr 21 - 0:01 /opt/sysscan/bin/sysscand

Method 3: Using netstat and kdb.

# netstat -Aan|grep 9991
f100020000626b98 tcp4 0 0 *.9991 *.* LISTEN

# kdb
The specified kernel file is a 64-bit kernel
Preserving 1418431 bytes of symbol table
First symbol __mulh
0000000000001000 0000000003E0D050 start+000FD8
F00000002FF47600 F00000002FFDC940 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F100070F00000000 F100070F10000000 pvproc+000000
F100070F10000000 F100070F18000000 pvthread+000000
raddr.....0000000000724000 eaddr.....F200800030000000
size..............00040000 align.............00001000
valid..1 ros....0 fixlmb.1 seg....0 wimg...2
ERROR: Unable to acess nfs_syms

(0)> sockinfo f100020000626b98 tcpcb

on last a few lines.

proc/fd: 49/0
proc/fd: fd: 0

pvproc+00C400 49*sysscand ACTIVE 00310E0 0000001 00000000285C7400 0 0001

(0)> hcal 00310E0
Value hexa: 000310E0 Value decimal: 200928

(0)> quit

# ps -ef|grep 200928
root 200928 1 0 Apr 21 - 0:01 /opt/sysscan/bin/sysscand